shendurelab / MPRAflow

A portable, flexible, parallelized tool for complete processing of massively parallel reporter assay data
Apache License 2.0
31 stars 16 forks source link

Process `create_BAM` input file name collision -- There are multiple input files for each of the following file names: null #62

Closed Will19902225 closed 2 years ago

Will19902225 commented 2 years ago

Hi, My Command nextflow run --w /home/huanglabdell/Documents/MPRAflow/countexamplework count.nf --experiment-file "/home/huanglabdell/Documents/MPRAflow/fastQFolder/experiment.csv" --dir "/home/huanglabdell/Documents/MPRAflow/fastQFolder" --outdir "/home/huanglabdell/Documents/MPRAflow/outputexample1"

N E X T F L O W ~ version 22.04.2 Launching count.nf [nostalgic_fermat] DSL1 - revision: d86f5331f8 Running MPRAflow count without design file. Running MPRAflow count without association file.

                                      ,--./,-.
      ___     __   __   __   ___     /,-._.--~'
|\ | |__  __ /  ` /  \ |__) |__         }  {
| \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                      `._,._,'

MPRAflow v2.3.1"

Pipeline Name : shendurelab/MPRAflow Pipeline Version: 2.3.1 Run Name : nostalgic_fermat Output dir : /home/huanglabdell/Documents/MPRAflow/outputexample1 Working dir : /home/huanglabdell/Documents/MPRAflow/work Current home : /home/huanglabdell Current user : huanglabdell Current path : /home/huanglabdell/Documents/MPRAflow Script dir : /home/huanglabdell/Documents/MPRAflow Config Profile : standard Experiment File: /home/huanglabdell/Documents/MPRAflow/fastQFolder/experiment.csv reads : DataflowQueue(queue=[]) UMIs : Reads with UMI BC length : 15 BC threshold : 10 mprAnalyze : false

start analysis [- ] process > create_BAM - [- ] process > raw_counts - [- ] process > filter_counts - [- ] process > final_counts - [- ] process > dna_rna_merge_counts - Error executing process > 'create_BAM (make idx)'

Caused by: Process create_BAM input file name collision -- There are multiple input files for each of the following file names: null

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh how to fix the error? LX

visze commented 2 years ago

You are missing two important options. First the association file (pickle file) generated by the association workflow and the design file in fasta format.

--design here_four_desig_file.fa --association here_your_bc_map.pickle

Further I noticed that changing the work direktory is -w not --w So in your case teh default /home/huanglabdell/Documents/MPRAflow/work was used and not /home/huanglabdell/Documents/MPRAflow/countexamplework

Will19902225 commented 2 years ago

Thanks for your response. Now working but new issues are appearing. I am trying to repeat the flow following the protocol on https://mpraflow.readthedocs.io/en/latest/index.html and the nature protocol paper. I am using the data SRR10800881 SRR10800882 SRR10800883 SRR10800884 SRR10800885 SRR10800886. I have tried many times. The association.nf works well, but the count.nf always error. And the size of the output files about count.nf is minimal.

Here is the command:

(MPRAflow) huanglabdell@huanglabdell-Precision-7820-Tower:~/Documents/MPRAflow$ nextflow run count.nf --e "/home/huanglabdell/Documents/MPRAflow/fastQFolder/experiment.csv" --dir "/home/huanglabdell/Documents/MPRAflow/fastQFolder" --outdir "/home/huanglabdell/Documents/MPRAflow/fastQFolder/519output" --design "/media/huanglabdell/NGS_data_d3/NGS_data_drive_3_2022/MPRA_sequence/MPRAflow/Assoc_Basic/data/design.fa" --association "/media/huanglabdell/NGS_data_d3/NGS_data_drive_3_2022/MPRA_sequence/MPRAflow/Assoc_Basic/output/assoc_basic/assoc_basic_filtered_coords_to_barcodes.pickle" --m 1 -resume --mpranalyze

Error executing process > 'generate_mpranalyze_inputs (1)'

Caused by: Process generate_mpranalyze_inputs (1) terminated with an error exit status (1)

Command executed:

python /home/huanglabdell/Documents/MPRAflow/src/mpranalyze_compiler.py HEPG2_final_labeled_counts.txt

Command exit status: 1

Command output: (empty) 520error 520error2 520error3

visze commented 2 years ago

might be an issue with creating the mpraalyze files. I will test it on our side an will come back to you. without --mpranalyzeit should run without any errors.

Will19902225 commented 2 years ago

I have figured it out. The code "gzip -c FileX.fastq > FileX.fastq.gz" should do after the data download. Fastq file never works and fasq.gz works very well. By the way, if I get the raw data file from the facility with the 150bp result. How did you extract the 15 bp barcodes or UMI from the raw data Capture ?

visze commented 2 years ago

Yes. the fastq input have to be gzipped! Otherwise it will not work.

In your case you have to do adapter trimming first before running MPRAflow. There are multiple tools out there that will do the job. E.g. trimmomatic

visze commented 2 years ago

close because of inactivity

please open again when needed