singleron-RD / CeleScope

Single Cell Analysis Pipelines
https://www.singleron.bio/
MIT License
89 stars 30 forks source link

Some ISSUEs concerning about constructing REFERENCE on vdj sequencing and full length immune receptor sequencing #269

Open Gethell opened 10 months ago

Gethell commented 10 months ago

Description When I did as the tutorial on official tweets, I met some issues as running the following codes: celescope vdj mkref human TR but encounted issue like :

Traceback (most recent call last): File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in sys.exit(main()) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main args.func(args) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref runner() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 35, in call self.combine_seq() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, **kwargs) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 59, in combine_seq assert len(imgt_files) == 7 AssertionError

And I run the code: celescope vdj mkref human IG I met another issue like:

Building a new DB, current time: 11/01/2023 15:27:28 New DB name: /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa New DB title: IGV.fa Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa Keep MBits: T Maximum file size: 3000000000B BLAST options error: File IGV.fa is empty Traceback (most recent call last): File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in <module> sys.exit(main()) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main args.func(args) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref runner() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 36, in __call__ self.build_index() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, **kwargs) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 76, in build_index subprocess.check_call(f"makeblastdb -parse_seqids -dbtype nucl -in {out_file_name}.fa", shell=True) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'makeblastdb -parse_seqids -dbtype nucl -in IGV.fa' returned non-zero exit status 1.

To Troubleshot I think the isse NO.1 might cause by the wrong format but have no idea about another issue, so I did some troubleshots:

  1. Format of one of the IMGT reference files showed as following: head -n 20 TRAJ.fasta trouble1

head -n 20 IGHD.fasta trouble3

  1. Check whether the output directory is null: ls -al ../IG/human_IG trouble2 but the "TR" directory is null.
  2. And I finally check version of celescope Version celescope -v 2.0.3

I sincerely expect some responses on resolution about these issues, or show me some normative format of IMGT data for constructing index file. Moreover, I hope that some detailed instance on how to construct index file about vdj or full length immune receptors sequencing. Last question I would like to raise: Whether the index files based on IMGT data between single cell vdj and full length immune receptor sequencing are coincident?

Chenjunjie1996 commented 10 months ago
  1. Make sure celescope version>=1.15.1 and refer to https://github.com/singleron-RD/CeleScope/blob/master/doc/assay/multi_vdj.md.
  2. IMGT and full length vdj reference are different.
Gethell commented 10 months ago

Thanks for your response! I tried as tutorial you provided, but unfortunately I encunted another issue while running codes like: wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta and wget http://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Mus_musculus/IG/IG{H,K,L}{V,J}.fasta The issue descriped as:

--2023-11-03 10:47:00-- https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR%7BA,B%7D%7BV,J%7D.fasta 正在解析主机 www.imgt.org (www.imgt.org)... 195.83.84.12 正在连接 www.imgt.org (www.imgt.org)|195.83.84.12|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 404 Not Found 2023-11-03 10:47:01 错误 404:Not Found。

trouble4
Chenjunjie1996 commented 10 months ago

Remove character \ Correct link:

wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta

Gethell commented 9 months ago

Remove character \ Correct link:

wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta

Thanks for your response. And finally I succeed following your advise!

Gethell commented 9 months ago

Description When I did as the tutorial on official tweets, I met some issues as running the following codes: celescope vdj mkref human TR but encounted issue like :

Traceback (most recent call last): File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in sys.exit(main()) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main args.func(args) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref runner() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 35, in call self.combine_seq() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, **kwargs) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 59, in combine_seq assert len(imgt_files) == 7 AssertionError

And I run the code: celescope vdj mkref human IG I met another issue like:

Building a new DB, current time: 11/01/2023 15:27:28 New DB name: /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa New DB title: IGV.fa Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa Keep MBits: T Maximum file size: 3000000000B BLAST options error: File IGV.fa is empty Traceback (most recent call last): File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in <module> sys.exit(main()) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main args.func(args) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref runner() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 36, in __call__ self.build_index() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, **kwargs) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 76, in build_index subprocess.check_call(f"makeblastdb -parse_seqids -dbtype nucl -in {out_file_name}.fa", shell=True) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'makeblastdb -parse_seqids -dbtype nucl -in IGV.fa' returned non-zero exit status 1.

To Troubleshot I think the isse NO.1 might cause by the wrong format but have no idea about another issue, so I did some troubleshots:

  1. Format of one of the IMGT reference files showed as following: head -n 20 TRAJ.fasta trouble1

head -n 20 IGHD.fasta trouble3

  1. Check whether the output directory is null: ls -al ../IG/human_IG trouble2 but the "TR" directory is null.
  2. And I finally check version of celescope Version celescope -v 2.0.3

I sincerely expect some responses on resolution about these issues, or show me some normative format of IMGT data for constructing index file. Moreover, I hope that some detailed instance on how to construct index file about vdj or full length immune receptors sequencing. Last question I would like to raise: Whether the index files based on IMGT data between single cell vdj and full length immune receptor sequencing are coincident?

I found out the key of issue that there remained some undesired files when I mannually downloaded data from IMGT. For instance, redundant file named "TRGJ" in the directory Path/TR/ made constructing reference failure. solution1

Chenjunjie1996 commented 9 months ago

Our vdj pipeline focuses on alpha/beta TCR. Using following command to download only alpha/beta TCR sequence from IMGT to avoid error when running celescope vdj mkref.

wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta