Problems with using scSplitter

SuqinYang commented 2 years ago

Hi,@zzhu33 I am trying to use scSplitter to split my 10X scRNAseq data, but I met an error "ERROR -2:invalid number of input files: 1 ".Would you please help me with this problem？

Following are my codes: python3 ./scSplitter.py --ig True --f ./InputNames.txt --i $input --r ./output --ind $STARmm10index

InputNames.txt: J21050039_S4_L001_R1_001.fastq.gz J21050039_S4_L001_R2_001.fastq.gz

zzhu33 commented 2 years ago

Hello Suqin, are the input names tab delimited? They need to be separated by a tab in order to be recognized as separate items. I don't see any other potential issues. Please let me know if that works out and if you run into any new problems.

SuqinYang commented 2 years ago

Hi,@zzhu33 Thank you for your quick reply! When I use a tab to separate,it works. But I met another problem, can you give me some advice to deal with this problem? Following are my codes: python3 ./scSplitter.py --ig True --f ./InputNames.txt --i $input --r ./output --ind $STARmm10index

InputNames.txt: J21050037_S2_L001_R1_001.fastq.gz J21050037_S2_L001_R2_001.fastq.gz

Following is the error: input directory: /WT1-12W output directory: /WT1-12W/S2 fastqs : J21050037_S2_L001_R1_001.fastq.gz, J21050037_S2_L001_R2_001.fastq.gz, ... STAR index : /refdata-gex-mm10-2020-A/star barcode_length : 16 sampl.ind.len : 8 UMI len : 10 cellReadsCutoff : 3000 sampleReadsCutoff : 1000000 maxReadOffset : 500 select one read per UMI : False ignore sample indices : True number of lanes : 1 user confirmation of inputs : False compressed inputs : True keep SAMs : True version : 10x v2 chemstry QC mode : lenient max pair offset : 50000000 221.03895568847656 GB memory available, running with 64 processes

decompressing inputs... finished decompressing inputs, 306.0769877433777 s, 306.15743041038513 s total

making chunks... split: /WT1-12W/S2/chunks/1_R1_chunk0000: Input/output error finished making chunks, 181.5571072101593 s, 487.77571988105774 s total deleting extracted inputs... finished deleting

processing barcodes... selecting barcodes... processing lane 1 barcodes... found 0 unique barcodes in lane 1, 0 selected with over 3000 reads, 0 out of 0 reads to be used, 0.020252466201782227 s

finished barcode selection, 0.3061368465423584 s, 491.4817953109741 s total starting reads alignment... aligning using 64 processes...

EXITING: FATAL INPUT ERROR: empty value for parameter "readFilesIn" in input "Command-Line" SOLUTION: use non-empty value for this parameter

Apr 02 23:39:04 ...... FATAL ERROR, exiting finished aligning , 1.1185228824615479 s

finished read alignment, 1.1189062595367432 s, 492.60070157051086 s total splitting sam... Traceback (most recent call last): File "/software/scSplitter-1.0.1/scSplitter/scSplitter.py", line 495, in main() File "/software/scSplitter-1.0.1/scSplitter/scSplitter.py", line 383, in main with open(samName) as fsm: FileNotFoundError: [Errno 2] No such file or directory: 'lane_1Aligned.out.sam'

zzhu33 commented 2 years ago

Hello @SuqinYang, it looks like there is something wrong with the barcode processing. Can you provide a small portion of the read 1 fastq? For the default option (10x v2 chemistry), read 1 is assumed to be in the format described in page 70 in this link: https://assets.ctfassets.net/an68im79xiti/5pQQHnGnYvbafHfwpZk8mF/29cb1ba381b869c072df3739e340e5ef/CG000330_Chromium_Next_GEM_Single_Cell_5-v2_Cell_Surface_Protein_UserGuide_RevD.pdf

SuqinYang commented 2 years ago

Hi,@zzhu33 Thank you for your quick reply!Following is a small portion of the read 1 fastq. r1.zip

zzhu33 commented 2 years ago

Hello @SuqinYang, it looks like the format of the R1 file is not what's expected of a 10x chromium v2 result. Read 1 should only be 26 nt, with the first 16 being the barcode and the rest 10 the UMI. Are your results from a v1 chemistry run? You can try using --chver 1 to see if that generates reasonable results. Otherwise, a compatible R1 is needed.

pushpinder-bu commented 2 years ago

Hello @SuqinYang I am getting the same error. Were you able to find a solution for it?

SuqinYang commented 2 years ago

Sorry,I haven't find a solution yet

2022-07-02 04:23:33pushpinder-bu @.***>写道：

Hello @SuqinYang I am getting the same error. Were you able to find a solution for it?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

zzhu33 commented 2 years ago

@pushpinder-bu Hello, unfortunately the tool requires specific formats for the R1 file to extract barcodes and UMIs. Can you provide some examples? If your cell barcode, UMI, or TSO (if applicable) lengths are different from the default, then you will need to find out the format of your R1 and change the corresponding options from the defaults. The expected format for --chver 2 is: <16bp cell barcode><10bp UMI>, for --chver 1: <16bp cell barcode><10bp UMI><13bp TSO><rest of read 1>.

pushpinder-bu commented 2 years ago

@zzhu33 This is how my R1 file looks like. r1.fastq.gz .

zzhu33 commented 2 years ago

@SuqinYang I should have caught this much earlier, but I just noticed that there is an i/o error for split in your output. Is your input directory (/WT1-12W) the full path? If not, please try using the full path instead and also add --chver 1 since your reads appear to be paired. Your other paths also begin with "/" but I'm guessing they are not absolute paths? Please check those as well and see if that helps. @pushpinder-bu It looks like your R1 is 28 bp, so you will need to use --chver 2 (the default option) and change --cb and --ul to match your cell barcode and UMI lengths. Also, please make sure to check your paths. It's safest to use full/absolute paths.

pushpinder-bu commented 2 years ago

@zzhu33 : As you suggested I changed --ul to 12 to match my UMI lengths and also gave the absolute paths to all the input and output. Now I am getting the below error. FileNotFoundError: [Errno 2] No such file or directory: 'lane_1Aligned.out.sam'

zzhu33 commented 2 years ago

@pushpinder-bu It seems like there was a problem during the alignment step, but there could have been issues earlier as well. Can you provide the full output from the command?

pushpinder-bu commented 2 years ago

@zzhu33 below is the full command python3 scSplitter.py --ig True --sam --ul 12 --f /path/to/InputNames.txt --i /path/to/Fastq --ind /path/to/starsolo/

zzhu33 / scSplitter

Problems with using scSplitter #5