Open ASLeonard opened 2 years ago
Hi @ASLeonard, would you please provide a (little if possible) reproducible example of your problem?
Even a subsetted og file was quite large, so if you need it I can provide it elsewhere. But the steps are
odgi paths -i pggb.smooth.og -L -l
#UCD 1 51098607
#Angus 1 51138358
#Highland 1 51174331
#OBV 1 52178328
#Brahman 1 51254507
cat breeds.txt
#UCD
#Angus
#Highland
#OBV
#Brahman
cat regions.bed
#UCD 1000 10000
#UCD 200000 3000000
odgi position -i pggb.smooth.og -R breeds.txt -b regions.bed
#UCD 1000 10000 UCD,1000,+ UCD,10000,+ +
#UCD 200000 3000000 UCD,200000,+ UCD,3000000,+ +
Where I would expect the additional lines
...
#UCD 200000 3000000 Highland,422079,+ Highland,3211922,+ +
...
which I can get by manually running
for breed in UCD Angus Highland...
do
odgi position -i pggb.smooth.og -r $breed -b regions.bed
done
Was looking a bit more into this but didn't make much progress. But here is some more information
If you don't provide -r
/-R
, all paths do seem to be available (i.e. ref_path_set
=#paths). However, there are still only 2 * number of bed lines calls (one for start and one for end) of get_position
, which seems to produce a single line of output. I'm not very clear on the step_handle/path_handle, so I'm not sure if it is even possible for a single call of get_position
to generate the bed coordinate lifted into all reference paths.
If I commented out this line https://github.com/pangenome/odgi/blob/984aee9d2d0649cc73d0151de3c268ade2988958/src/subcommand/position_main.cpp#L651
I could at least get all the path names printed, but probably would require some work to get that as printing out the lifted over positions for them all.
This is all done using the latest docker image.
I'm trying to lift coordinates through a pggb graph from one assembly to another with
odgi position
. This works fine for a given path (where regions.bed contains coordinates for ASM_1)However, the documentation leads me to believe using
-R ref_paths.txt
containing ASM_2, ASM_3, ASM_4, etc. would then give output for every bed position for every ref_path in the-R
file. However, it still only takes the first entry from the ref_paths. Similarly in the source code for odgi position, it seems to suggest not providing any ref_path name or file should default to using all paths in the graph https://github.com/pangenome/odgi/blob/bd54ab80ea03d864c82bd05599bf1dfa1777a72f/src/subcommand/position_main.cpp#L137-L139Am I misunderstanding the approach here? Currently the working solution is just to invoke N calls of odgi position to liftover the entire bed file for the N other assemblies, but that seems inefficient.
Thanks, Alex