Closed xtan1221 closed 1 year ago
Hi @xtan1221 , I rerun the pipeline and it finished without errors. I get the following files:
-rwxrwx--- 1 goel grp_schneeberger 12M Nov 2 15:55 GCA_000146045.2_R64_genomic.fna
-rwxrwx--- 1 goel grp_schneeberger 12M Nov 2 15:55 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna
-rwxrwx--- 1 goel grp_schneeberger 12M Nov 2 15:55 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
lrwxrwxrwx 1 goel grp_schneeberger 31 Nov 2 15:55 refgenome -> GCA_000146045.2_R64_genomic.fna
lrwxrwxrwx 1 goel grp_schneeberger 50 Nov 2 15:55 qrygenome -> GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
-rwxrwx--- 1 goel grp_schneeberger 486K Nov 2 15:57 out.delta
-rwxrwx--- 1 goel grp_schneeberger 159K Nov 2 15:57 out.filtered.delta
-rwxrwx--- 1 goel grp_schneeberger 67K Nov 2 15:57 out.filtered.coords
-rwxrwx--- 1 goel grp_schneeberger 352 Nov 2 16:51 mapids.txt
-rwxrwx--- 1 goel grp_schneeberger 7.1M Nov 2 16:52 syri.out
-rwxrwx--- 1 goel grp_schneeberger 12M Nov 2 16:52 syri.vcf
-rwxrwx--- 1 goel grp_schneeberger 541 Nov 2 16:52 syri.summary
-rwxrwx--- 1 goel grp_schneeberger 11K Nov 2 16:52 syri.log
Can you please check whether you have all files above mapids.txt and whether the sizes of those files match?
Hi @mnshgl0110, thank you for your response.
I do have all the files above mapids.txt with the same size before I run syri;
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000146045.2_R64_genomic.fna
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
-rw-r--r-- 1 tan staff 487K Nov 2 17:29 out.delta
-rw-r--r-- 1 tan staff 67K Nov 2 17:29 out.filtered.coords
-rw-r--r-- 1 tan staff 159K Nov 2 17:29 out.filtered.delta
lrwxr-xr-x 1 tan staff 50B Nov 2 17:28 qrygenome@ -> GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
lrwxr-xr-x 1 tan staff 31B Nov 2 17:27 refgenome@ -> GCA_000146045.2_R64_genomic.fna
then I run syri, the same error occurred:
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - Matching them automatically. For each reference genome, most similar query genome will be selected. Check mapids.txt for mapping used.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "syri/pyxFiles/findshv.pyx", line 108, in syri.findshv.getsnps
KeyError: 'CP006105.2'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/tan/opt/anaconda3/envs/syri_master/bin/syri", line 6, in <module>
main(sys.argv[1:])
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/site-packages/syri/scripts/syri.py", line 326, in main
syri(args)
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/site-packages/syri/scripts/syri.py", line 252, in syri
getshv(args, coords, chrlink)
File "syri/pyxFiles/findshv.pyx", line 203, in syri.findshv.getshv
File "syri/pyxFiles/findshv.pyx", line 204, in syri.findshv.getshv
File "syri/pyxFiles/findshv.pyx", line 205, in syri.findshv.getshv
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: 'CP006105.2'
Below are the generated files:
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000146045.2_R64_genomic.fna
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
lrwxr-xr-x 1 tan staff 31B Nov 2 17:27 refgenome@ -> GCA_000146045.2_R64_genomic.fna
lrwxr-xr-x 1 tan staff 50B Nov 2 17:28 qrygenome@ -> GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
-rw-r--r-- 1 tan staff 487K Nov 2 17:29 out.delta
-rw-r--r-- 1 tan staff 159K Nov 2 17:29 out.filtered.delta
-rw-r--r-- 1 tan staff 67K Nov 2 17:29 out.filtered.coords
-rw-r--r-- 1 tan staff 352B Nov 2 17:36 mapids.txt
-rw-r--r-- 1 tan staff 274B Nov 2 17:36 invOut.txt
-rw-r--r-- 1 tan staff 684B Nov 2 17:36 TLOut.txt
-rw-r--r-- 1 tan staff 838B Nov 2 17:36 invTLOut.txt
-rw-r--r-- 1 tan staff 6.7K Nov 2 17:36 dupOut.txt
-rw-r--r-- 1 tan staff 728B Nov 2 17:36 invDupOut.txt
-rw-r--r-- 1 tan staff 31K Nov 2 17:36 ctxOut.txt
-rw-r--r-- 1 tan staff 19K Nov 2 17:36 synOut.txt
-rw-r--r-- 1 tan staff 95K Nov 2 17:36 sv.txt
-rw-r--r-- 1 tan staff 10K Nov 2 17:36 notAligned.txt
-rw-r--r-- 1 tan staff 2.6K Nov 2 17:36 syri.log
-rw-r--r-- 1 tan staff 0B Nov 2 17:36 snps_init.txt
it looks like a bunch of intermediate files are generated (I assume) but it stopped when finding the SNPs and small indels (snps_init.txt file is empty as shown above). Below is the content in syri.log file:
2022-11-02 17:41:14,037 - Reading Coords - INFO - syri:135 - Reading input from .tsv file
2022-11-02 17:41:14,047 - Reading Coords - WARNING - syri:135 - Chromosomes IDs do not match.
2022-11-02 17:41:14,048 - Reading Coords - WARNING - syri:135 - Matching them automatically. For each reference genome, most similar query genome will be selected. Check mapids.txt for mapping used.
2022-11-02 17:41:14,211 - Reading Coords - INFO - syri:135 - setting CP006105.2 as BK006934.2
2022-11-02 17:41:14,211 - Reading Coords - INFO - syri:135 - setting CP004488.2 as BK006935.2
2022-11-02 17:41:14,212 - Reading Coords - INFO - syri:135 - setting CP004578.2 as BK006936.2
2022-11-02 17:41:14,212 - Reading Coords - INFO - syri:135 - setting CP006317.1 as BK006937.2
2022-11-02 17:41:14,212 - Reading Coords - INFO - syri:135 - setting CP004738.2 as BK006938.2
2022-11-02 17:41:14,213 - Reading Coords - INFO - syri:135 - setting CP004833.2 as BK006939.2
2022-11-02 17:41:14,213 - Reading Coords - INFO - syri:135 - setting CP004968.2 as BK006940.2
2022-11-02 17:41:14,213 - Reading Coords - INFO - syri:135 - setting CP005272.2 as BK006941.2
2022-11-02 17:41:14,213 - Reading Coords - INFO - syri:135 - setting CP005061.2 as BK006942.2
2022-11-02 17:41:14,214 - Reading Coords - INFO - syri:135 - setting CP005174.1 as BK006943.2
2022-11-02 17:41:14,214 - Reading Coords - INFO - syri:135 - setting CP005369.2 as BK006944.2
2022-11-02 17:41:14,214 - Reading Coords - INFO - syri:135 - setting CP006421.1 as BK006945.2
2022-11-02 17:41:14,215 - Reading Coords - INFO - syri:135 - setting CP005470.2 as BK006946.2
2022-11-02 17:41:14,215 - Reading Coords - INFO - syri:135 - setting CP005572.1 as BK006947.3
2022-11-02 17:41:14,215 - Reading Coords - INFO - syri:135 - setting CP005666.2 as BK006948.2
2022-11-02 17:41:14,215 - Reading Coords - INFO - syri:135 - setting CP006197.2 as BK006949.2
2022-11-02 17:41:14,345 - syri - INFO - syri:214 - starting
2022-11-02 17:41:14,346 - syri - INFO - syri:214 - Analysing chromosomes: ['BK006934.2', 'BK006935.2', 'BK006936.2', 'BK006937.2', 'BK006938.2', 'BK006939.2', 'BK006940.2', 'BK006941.2', 'BK006942.2', 'BK006943.2', 'BK006944.2', 'BK006945.2', 'BK006946.2', 'BK006947.3', 'BK006948.2', 'BK006949.2']
2022-11-02 17:41:15,965 - getCTX - INFO - syri:214 - Identifying cross-chromosomal translocation and duplication for chromosome2022-11-02 17:41:15.965046
2022-11-02 17:41:19,753 - local_variation - INFO - syri:225 - Finding SVs in synOut.txt, invOut.txt, TLOut.txt, invTLOut.txt, ctxOut.txt
2022-11-02 17:41:20,542 - local_variation - INFO - syri:245 - Finding SNPs and small indels
Any idea about how this occurred?
Thanks!
I think this is caused because syri cannot run show-snps
from mummer. Could you please check that show-snps
is in PATH
? You can also try to run syri with the -s
parameter.
@mnshgl0110
show-snps
of MUMmer was installed and can be run from anywhere. I also directly run it for testing:
show-snps out.filtered.delta >test-show-snps.txt
and the output file was successfully generated without any error:
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000146045.2_R64_genomic.fna
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna
-rw-r--r-- 1 tan staff 12M Nov 2 17:27 GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
-rw-r--r-- 1 tan staff 684B Nov 2 17:41 TLOut.txt
-rw-r--r-- 1 tan staff 31K Nov 2 17:41 ctxOut.txt
-rw-r--r-- 1 tan staff 6.7K Nov 2 17:41 dupOut.txt
-rw-r--r-- 1 tan staff 728B Nov 2 17:41 invDupOut.txt
-rw-r--r-- 1 tan staff 274B Nov 2 17:41 invOut.txt
-rw-r--r-- 1 tan staff 838B Nov 2 17:41 invTLOut.txt
-rw-r--r-- 1 tan staff 352B Nov 2 17:41 mapids.txt
-rw-r--r-- 1 tan staff 10K Nov 2 17:41 notAligned.txt
-rw-r--r-- 1 tan staff 487K Nov 2 17:29 out.delta
-rw-r--r-- 1 tan staff 67K Nov 2 17:29 out.filtered.coords
-rw-r--r-- 1 tan staff 159K Nov 2 17:29 out.filtered.delta
lrwxr-xr-x 1 tan staff 50B Nov 2 17:28 qrygenome@ -> GCA_000977955.2_Sc_YJM1447_v1_genomic.fna.filtered
lrwxr-xr-x 1 tan staff 31B Nov 2 17:27 refgenome@ -> GCA_000146045.2_R64_genomic.fna
-rw-r--r-- 1 tan staff 0B Nov 2 17:41 snps_init.txt
-rw-r--r-- 1 tan staff 95K Nov 2 17:41 sv.txt
-rw-r--r-- 1 tan staff 19K Nov 2 17:41 synOut.txt
-rw-r--r-- 1 tan staff 0B Nov 3 17:04 syri.log
-rw-r--r-- 1 tan staff 13M Nov 3 17:07 test-show-snps.txt
So I guess it should not be the show-snps
causing the problem?
Syri calls show-snps
and saves the output in snps_init.txt
file. Later, it reads the file and selects variants for each chromosome. Currently, the snps_init.txt
file is empty (it should not be), suggesting that the reported error is happening when syri tries to get variants for chromosomes from it.
Did you also try with the -s
parameter?
This could also be a Mac issue. Currently, syri starts a subprocess to run show-snps
and I am wondering could it be possible that mac isn't happy with that. If possible, could you please try to run syri on linux?
Alternatively, you can use BAM/PAF files as input, then syri would not use show-snps
and probably you would not get the error.
I have run SyRI on linux and there is no error occurred. So I think this should be a MacOS issue. Thanks for the help!
I successfully installed the lasted version on my macOS Monterey (chip M1 Pro) from bioconda. Then I tried to run the pipeline.sh under example/ folder with MUMmer:
syri -c out.filtered.coords -d out.filtered.delta -r refgenome -q qrygenome
, the following error message were generated:conda install cython=0.29.23 numpy=1.21.2 scipy=1.6.2 pandas=1.2.4 python-igraph=0.9.1 psutil=5.8.0 pysam=0.16.0.1 matplotlib=3.3.4 pip install .
multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "syri/pyxFiles/findshv.pyx", line 108, in syri.findshv.getsnps KeyError: 'CP006105.2' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/Users/tan/opt/anaconda3/envs/syri_master/bin/syri", line 6, in
main(sys.argv[1:])
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/site-packages/syri/scripts/syri.py", line 326, in main
syri(args)
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/site-packages/syri/scripts/syri.py", line 252, in syri
getshv(args, coords, chrlink)
File "syri/pyxFiles/findshv.pyx", line 203, in syri.findshv.getshv
File "syri/pyxFiles/findshv.pyx", line 204, in syri.findshv.getshv
File "syri/pyxFiles/findshv.pyx", line 205, in syri.findshv.getshv
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/tan/opt/anaconda3/envs/syri_master/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: 'CP006105.2'