Is it okay to skip MergeUMI.py step?

hd00ljy commented 3 years ago

Hello!

I am trying to run 10xR2C2 on my C3POa postprocessed data.

I am wondering if it is okay to skip ExtractUMI.py and MergeUMI.py ( but not MergeUMI10x.py )

After I run the C3POa_postprocessing.py with the following code

time python ${C3POA_base}/C3POa_postprocessing.py \
 -i ${consensus} \
 -o ${odir2} \
 -c ${cfg} \
 -n ${thread} \
 -bt \
 -a ${adapter} \
 -b

I get the following list of files

R2C2_full_length_consensus_reads.fasta
R2C2_full_length_consensus_reads_10X_sequences.fasta
R2C2_full_length_consensus_reads_left_splint.fasta
R2C2_full_length_consensus_reads_right_splint.fasta

I tried running ExtractUMI and MergeUMI on the resulting file with the following code

time python3 ${R2C2}/ExtractUMIs.py \
 -i5 ${odir2}/R2C2_full_length_consensus_reads_right_splint.fasta \
 -i3 ${odir2}/R2C2_full_length_consensus_reads_left_splint.fasta \
 -i ${odir2}/R2C2_full_length_consensus_reads.fasta \
 -o ${odir3_temp1}

time python3 ${R2C2}/MergeUMIs.py  \
 -f ${odir2}/R2C2_full_length_consensus_reads.fasta \
 -s ${odir1}/Splint1/R2C2_Subreads.fastq \
 -o ${odir3_temp2} \
 -u ${odir3_temp1}/R2C2_full_length_consensus_reads.UMI \
 -c ${cfg}

But after that, I found that the resulting "R2C2_full_length_consensus_reads_UMI_merged.fasta" file from MergeUMIs.py does not have the same order of IDs as the "R2C2_full_length_consensus_reads_10X_sequences.fasta" from C3POa_postprocessing.py This seems to be because several consensus reads with the same splint UMI are merged into a single FASTA line.

And this led to problems in demux step - all non-matching reads are discarded

Could you help me on this issue? How can I match "R2C2_full_length_consensus_reads_10X_sequences.fasta"(C3POa_postprocessing.py result) with the "R2C2_full_length_consensus_reads_UMI_merged.fasta"(MergeUMIs.py result). Or is it okay to just skip split-UMI merging steps?

rvolden commented 3 years ago

If you do the splint UMI merging step, you should run the postprocessing on the data again to get your updated sequences

hd00ljy commented 3 years ago

Thank you for your answer!

If that is the case, is it also possible to run in the following order?

C3POa.py -> ExtractUMI.py/MergeUMI.py -> C3POa_postprocessing.py

Additionally, could you share an example pipeline you used for doi: https://doi.org/10.1101/2020.01.10.902361, starting from C3POa.py to 10xR2C2 seurat input?

bingwu2017 commented 3 years ago

I would like to second hd00ljy's request. In dire need of such a reference pipeline myself.

rvolden / 10xR2C2

Is it okay to skip MergeUMI.py step? #5