Closed emarti88 closed 3 years ago
Below is the entire message:
r1_right_cut = 10 r2_left_cut = 10 r2_right_cut = 10 quality_threshold = 20 length_threshold = 30 total_read_pairs_min = 1 total_read_pairs_max = 6000000 mapq_threshold = 10 num_upstr_bases = 0 num_downstr_bases = 2 compress_level = 5 unmapped_fastq = False unmapped_param_str = '' mode = 'mc' barcode_version = 'V2' r1_adapter = 'AGATCGGAAGAGCACACGTCTGAAC' r2_adapter = 'AGATCGGAAGAGCGTCGTGTAGGGA' bismark_reference = '/dcl01/FB2/data/personal/erafaelm/genomes/hg19/Bisulfite_Genome' reference_fasta = '/dcl01/FB2/data/personal/erafaelm/genomes/hg19/genome.fa' chrom_sizes_file = 'CHANGE_THIS_TO_YOUR_CHROM_SIZES_FILE' mc_stat_feature = 'CHN CGN CCC' mc_stat_alias = 'mCH mCG mCCC'
24 FASTQ file paths in input Traceback (most recent call last): File "/users/erafaelm/.conda/envs/mapping/lib/python3.7/site-packages/cemba_data/demultiplex/fastq_dataframe.py", line 58, in _parse_v2_fastq_path assert primer_name[0] in 'ABCDEFGHIJKLMNOP' AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/users/erafaelm/.conda/envs/mapping/lib/python3.7/site-packages/cemba_data/demultiplex/fastq_dataframe.py", line 64, in _parse_v2_fastq_path raise ValueError ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/erafaelm/.conda/envs/mapping/bin/yap", line 8, in
Hi Eduardo,
Can you provide more information about your input data?
Please note that YAP is a pipeline designed specifically for snmC-seq based data generated in the Ecker Lab, I have to define many specific things based on our data generation process. Therefore, I don't estimate this pipeline can be directly applicable to other single-cell methylome datasets, especially the demultiplexing step, which I don't have general support to data generated outside my lab.
But the demultiplex step is essentially using cutadapt functions. You can see cutadapt documentation here: https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing
If you can get single-cell FASTQ files from your raw data, you may be able to use YAP to prepare snakemake files based on this: https://hq-1.gitbook.io/mc/mapping-form-cell-level-fastq-files
Best Hanqing
Hi Hanqing,
Thanks for your response. The libraries were prepared with the snmC-seq2 method as described by the Ecker lab. It was only a trial run and we have only 96 cells for a single sample. We prepared the library with 12 different standard dual indexes. Those have been demuxed (so we have 12 different folders on that demuxing; hence the fastq pattern having multiple folders path//fastq) in which there should be 8 cells represented in each of the folders' fastq files. Those fastq files need demuxing according to your 6bp in line sequences.
Is it better to pool all the fastq files in a single folder to make it work better? Or do you think there might be a problem with the config file? Please see above for all the output after running the command before the error.
Do you have any thoughts?
Thanks, Eduardo
Hi Eduardo,
I understand you are using the snmC-seq2 protocol, to demultiplex your fastq files, you can use the cutadapt demultiplex function, following this part of the documentation: https://hq-1.gitbook.io/mc/#important-note. This is also what I used for my data. Once you get the single-cell FASTQ file pairs, you can map them using this part of documentation here: https://hq-1.gitbook.io/mc/mapping-form-cell-level-fastq-files
As I noted in the documentation this pipeline is customized for many ongoing projects in the lab, so I do not aim to provide support for general use cases due to my time limitation. I hope you understand. But I am willing to discuss any problems you met when analyzing your snmC-seq2 data.
Best Hanqing
Hello,
I am running yap demultiplex with YAP with the following command:
yap demultiplex --fastq_pattern "/path/to/fastq_files/snmC-seq2//fastq.gz" --output_dir /outdir/demux_output --config_path ./hg19_mapping_config.txt --cpu 4
However, I keep getting the following error:
Message: 'No fastq name remained, check if the name pattern is correct.'
Am I formatting the fastq_pattern correctly? I am certain that is where the fastq files are. Do you have an exact example of how to format the fastq_path?
Thank you. Eduardo