Open mnshgl0110 opened 1 year ago
(reposting here, as replying to the email seems not to have worked)
Hi Manish,
I got the same error message yesterday when I was fixing an issue related to the parallelization, which I discovered while benchmarking.
The error is fixed now and when I tried with the latest commit in the repo (branch leon, now also merged to master),
pansyri -i genomes.tsv --sp --syn
did not throw an error.
Let me know if there are still issues when running the current version!
Cheers,
Leon
Hi Leon, So, this is the current status:
import pansyri.util as util
from pansyri.pansyn import find_multisyn
syns, alns = util.parse_input_tsv('genomes.tsv')
df = util.coresyn_from_lists(syns, alns, SYNAL=False) # Does not work
df = find_multisyn(syns, alns, SYNAL=False) # Works but give crosssyn as well
df = find_multisyn(syns, alns, SYNAL=False, only_core=True) # Does not work
We need to ensure that this is working for all use cases.
It seems that this issue is caused when pansyri does not like the input file names in the genomes.tsv
, specifically how the bam/syri.out files are named.
I can reproduce the error. It's weird that this only arises when calling core synteny. On the ampril dataset, all combinations work. I'll look more into this later.
Okay, I think i might have fixed what is happening in c565b85. There was still some code specific to testing on the ampril dataset in there that was also causing some other issues.
Earlier, it seemed to be working when the filenames were ref_qry1.bam' and
ref_qry2.bam`, but not when they were something else. Were you able to reproduce and possibly fix that?
I think this commit should fix the need for this filename format (it was hardcoded to match the names in the Ampril dataset). I'll try to reproduce it and see if normal naming works in the next few days.
Ah, sorry I forgot to test it again after the commit. My account for the HPC at Cologne is expired now, I'll test it again when I get the account renewed. Testing locally, everything works on the ampril dataset, but that's not really a surprise.
I think there is some incompleteness in the
pansyri.pansyn.find_overlaps
as it is giving me error when I try to get pansyntenic region with two highly similar (actually simulated) query genomes. The files are here: /srv/netscratch/dep_mercier/grp_schneeberger/projects/syri2/results/human/simulatedgenomes/chr22We can discuss it when you have some time.