schneebergerlab / plotsr

Tool to plot synteny and structural rearrangements between genomes
MIT License
288 stars 28 forks source link

Errors running 3 way genome comparison #11

Closed annerilotter closed 2 years ago

annerilotter commented 2 years ago

Hi

I am trying to run plotsr but get the following error:

Traceback (most recent call last): File "/home/FCAM/alotter/miniconda3/envs/py38/bin/plotsr", line 4, in <module> __import__('pkg_resources').run_script('plotsr==0.5.2', 'plotsr') File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/__init__.py", line 656, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1460, in run_script exec(script_code, namespace, namespace) File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/site-packages/plotsr-0.5.2-py3.8.egg/EGG-INFO/scripts/plotsr", line 6, in <module> File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/site-packages/plotsr-0.5.2-py3.8.egg/plotsr/main.py", line 54, in main File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/site-packages/plotsr-0.5.2-py3.8.egg/plotsr/plotsr.py", line 161, in plotsr File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/site-packages/plotsr-0.5.2-py3.8.egg/plotsr/func.py", line 824, in validalign2fasta File "/home/FCAM/alotter/miniconda3/envs/py38/lib/python3.8/posixpath.py", line 142, in basename p = os.fspath(p) TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper

mnshgl0110 commented 2 years ago

Based on this error message, it is difficult to say why you are getting this error. Is this the entire error message or was there something more? Could you please also share the command that you used as well as the genomes.txt?

annerilotter commented 2 years ago

Hi

Here is the command I used:

plotsr \ --sr v2_eghap.out \ --sr eghap_euhap.out \ --sr euhap_v2syri.out \ --chrord chrord.txt \ --genomes genomes.txt \ -o egv2_eg_eu.svg

And the genomes.txt

#file name tags eg.v2.chr.fasta EGRv2 lc:#C00000 eg.chr.fasta EGR lc:#00B050 eu.chr.fasta EUR lc:#1E6CD1

That is the entire error message.

mnshgl0110 commented 2 years ago

The input filename reads 'v2_eghap.out' and not 'v2_eghapsyri.out'. Is this the correct input filename? Other than that everything looks fine, so unfortunately I don't have any obvious reasons why this could be happening. Have you tried plotting the example sample? Maybe try that to see whether the issue is with plotsr or how your files are setup.

Also, please make sure that the columns in genomes.txt are separated by tabs and not spaces.

Could you generate the plot with two samples? Maybe, test different combinations of two samples. Also, try without using the --chrord option. Otherwise, do you have a minimalist example (genomes fasta, syri.out, and genomes.txt) that you can share for me to check.

annerilotter commented 2 years ago

Hi

I do not have an example file, but used the plotsr previously with syri before this version was released where multiple alignments are possible (it worked then with only two). I am running the example files now as well as syri again and will see how it goes.

mnshgl0110 commented 2 years ago

Hi, I found a bug in one of the error-handling lines. It is fixed now. You can update plotsr using the following commands:

git clone https://github.com/schneebergerlab/plotsr.git
cd plotsr
python setup.py install

However, as the bug was triggered by an error, I would assume that it would still not work out-of-the-box for your data.

The error is happening because alignment coordinate is more than the chromosome length in the genome fasta. This is obviously not possible, suggesting inconsistencies. So, you would have to check for that.

Best Manish

annerilotter commented 2 years ago

Hi

So, it seems that it is fine with the first two files, but the third is an issue. I even reran with SyRI v1.5 (although the number of rearrangements differ then, which I assume could be due to not running nucmer as well to use as input for mummer) and it still gave an error. The third plot was supposed to be going back to the original first reference i.e. the first line overall, which is a bit bigger in size than the other two assemblies.

I also have a question about the visualization because using the original output files, the number of duplications seem a lot higher than when I used plotsr that was still part of SyRI? Also some previous translocation (green) changed to duplications and vice versa. Does it handle the information differently?

Kind regards

mnshgl0110 commented 2 years ago

I just realised that you have three comparisons (AvsB, BvsC, CvsA) so that would require 4 genomes (A, B, C, A) in the genomes.txt file. Sorry for not noticing that earlier.

the number of duplications seem a lot higher than when I used plotsr that was still part of SyRI?

I cannot think of any changes being made that would increase the number of duplications compared to the earlier versions. Nevertheless, plotsr does not call duplications, so as long as all the plotted duplications are in the input file, I would consider this to be ok.

Also some previous translocation (green) changed to duplications and vice versa. Does it handle the information differently?

For highly repetitive regions, syri uses some heuristic which could result in this. Or do you mean that with the same input plotsr plotted different translocations/duplications? If the changes are happening in syri output, then it is OK; if they are happening in plotsr then possibly a bug.

annerilotter commented 2 years ago

Hi

I meant the plotting looks different with the new plotsr package using the same syri input file for example

syri plotsr python script

image

vs

new plotsr version

image

This is the same chromosome and same syri.out file used, with same query and reference

Adding the 4th fasta file worked. Thanks :). For interest sake, would it still work if comparison in the syri.out file was in the wrong order for the visualization (i.e. switched reference and query)?

mnshgl0110 commented 2 years ago

Hi,

would it still work if comparison in the syri.out file was in the wrong order for the visualization

It is possible to visualise AvsB, BvsA, AvsB and so on, the requirement is that the order of genomes in comparisons is same as that in genomes.txt. So, in this case, the genomes.txt would need to have genomes A,B,A and B. I hope this makes it clear.

I agree, the plots do look different. I guess that curved paths are just more visible making it look like there are more SRs. The translocation/duplication switch in the middle is weird though. Can you please check in the syri.out what is the correct annotation? I suspect a bug, not sure whether it was in the old version or new :D

Best Manish

annerilotter commented 2 years ago

Yeah, it seems syri called it an INVDPAL, so it is a duplication then. I wonder why then the old plotsr saw it as a translocation.

image

Yes thanks I am all clear with the explanation.

mnshgl0110 commented 2 years ago

Great! I will close this issue then.