singleron-RD / CeleScope

Single Cell Analysis Pipelines
https://www.singleron.bio/
MIT License
92 stars 31 forks source link

Multiple errors trying to analyze flv_trust4 data #288

Open panapapa14 opened 6 months ago

panapapa14 commented 6 months ago

As I mentioned above, I am trying to find out what is going on with some flv_trust4 data totally unsuccesfully. On top of the lack of efficient guidance from the company, we made numerous trials getting repeatedly the following errors:

multi_flv_trust4
--mapfile ./vdj.mapfile \
--ref GRCm38 \ --thread 8 \ --seqtype TCR \
--mod shell

CONDA_DEFAULT_ENV is not set. sjm mode may not available. 2024-05-31 13:38:17,140 - celescope.tools.multi.parse_mapfile - INFO - start... Allowed R1 patterns: /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001_1.fq /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001_1.fq.gz /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001_1.fastq /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001_1.fastq.gz /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1.fq /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1.fq.gz /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1.fastq /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1.fastq.gz /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1_001.fq /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1_001.fq.gz /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1_001.fastq /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz/C231130001R1_001.fastq.gz Traceback (most recent call last): File "/home/diopap/.local/bin/multi_flv_trust4", line 8, in sys.exit(main()) File "/home/diopap/.local/lib/python3.10/site-packages/celescope/flv_trust4/multi_flv_trust4.py", line 80, in main multi.run() File "/home/diopap/.local/lib/python3.10/site-packages/celescope/tools/multi.py", line 420, in run self.prepare() File "/home/diopap/.local/lib/python3.10/site-packages/celescope/tools/multi.py", line 199, in prepare self.fq_dict, self.col4_dict, self.col5_dict = self.parse_mapfile(self.args.mapfile, self.col4_default, self.args.use_R3) File "/home/diopap/.local/lib/python3.10/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, *kwargs) File "/home/diopap/.local/lib/python3.10/site-packages/celescope/tools/multi.py", line 149, in parse_mapfile fq1, fq2 = get_fq(library_id, library_path, use_R3) File "/home/diopap/.local/lib/python3.10/site-packages/celescope/tools/multi.py", line 452, in get_fq fq1_list = get_read(library_id, library_path, read='1') File "/home/diopap/.local/lib/python3.10/site-packages/celescope/tools/multi.py", line 442, in get_read raise Exception( Exception: Invalid Read1 path! library_id: C231130001 library_path: /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001sCT.fastq.gz

diopap@DIO:/mnt/c/users/dipap/Documents/tcr$ ls

zhouyiqi91 commented 6 months ago

https://github.com/singleron-RD/CeleScope/blob/master/doc/assay/multi_flv_trust4.md#arguments

When running multi_flv_trust4, the mapfile needs 4 columns. 1st column: Fastq file prefix 2nd column: Fastq file directory path

The full path of the R1 fastq files will be {Fastq file directory path}/{Fastq file prefix}_*_{R1 Fastq file suffix}. There are several allowed R1 Fastq file suffix, e.g. R1.fq.gz, R1.fastq.gz, etc. You can find all the valid R1 fastq pattern in the log. It seems that the 2nd column should be /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/, without C231130001_sCT_.fastq.gz

Allowed R1 patterns:
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C2311300011.fq
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT.fastq.gz/C2311300011.fq.gz
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT.fastq.gz/C2311300011.fastq
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT.fastq.gz/C2311300011.fastq.gz
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT.fastq.gz/C231130001R1.fq
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1.fq.gz
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1.fastq
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1.fastq.gz
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1_001.fq
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1_001.fq.gz
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1_001.fastq
/mnt/c/users/dipap/Documents/tcr/fastq_files_backup/C231130001_sCT_.fastq.gz/C231130001R1_001.fastq.gz
diopapamath commented 6 months ago

the error seems to have gone away but although multi_flv_trust4 runs without any errors it finishes pretty fast and without producing any output, meaning that something is still wrong. I am sharing below my files in order to pinpoint which file or configuration leads to the quick no output completion of the command.

this is what i get: (tcr_analysis) diopap@DIO:/mnt/c/users/dipap/Documents/tcr$ multi_flv_trust4 --mapfile ./vdj.mapfile --ref GRCm38 --thread 8 --seqtype TCR --mod shell 2024-06-01 19:36:20,839 - celescope.tools.multi.parse_mapfile - INFO - start... 2024-06-01 19:36:20,924 - celescope.tools.multi.parse_mapfile - INFO - done. time used: 0:00:00.085204

my path to the files is : /mnt/c/users/dipap/Documents/tcr/fastq_files_backup the name of the fastqs : 'C231130001sCTR1_001.fastq.gz' 'C231130001sCTR2_001.fastq.gz' (this is one sample of 2 paired end read files , to test my code) my vdj.mapfile with these 4 columns : C231130001 /mnt/c/users/dipap/Documents/tcr/fastq_files_backup/ C231130001 /mnt/c/users/dipap/Documents/tcr//matched_dir (matched.dir exists in the ./tcr directory)

Thanks in advance for the help

zhouyiqi91 commented 6 months ago

Have you run the scRNA-Seq data? You need the scRNA-Seq cell barcodes to run the flv_trust4 pipeline.

multi_{assay} only generate the shell scripts in the shell folder, not actually running the pipeline.

diopapamath commented 6 months ago

Hello, i thought demultiplexing was part of the multi_flv_trust4. We have performed siCircle for 6samples in singleron germany and we have the fastq files. Yes i see the commands in the shell script but i dont have anything else apart from the fastqs and the html report. Thank you!!

panapapa14 commented 5 months ago

Hello there! Our problem with @diopapamath is that we only find guidelines for the multi_{assay} that, as you said, creates a plain shell folder. What about the actual pipeline in terms of the script that we need to execute?