shendurelab / MPRAflow

A portable, flexible, parallelized tool for complete processing of massively parallel reporter assay data
Apache License 2.0
31 stars 16 forks source link

No s_merged.bam output #70

Closed Minb88 closed 2 years ago

Minb88 commented 2 years ago

Hiya,

I'm running the association part of the MPRAflow using the Nextflow version 22.04.5 and MPRAflow v2.3.5. I get all the outputs listed except the bam outputs with this? I also get an additional file MPRAworkflow_barcodes_per_candidate-no_repeats-no_jackpots.feather, not sure what this is?

Appreciate any help.

Thanks

visze commented 2 years ago

Hey,

so this is a special behavior of the underlying workflow manager nextflow. All files are stored (in a not intuitive way) under the work directory. When you want to have files in the output directory you have to specifically define it in the workflow file.

E.g. for the final filtered assignments we copy all outputs using this line: https://github.com/shendurelab/MPRAflow/blob/fb359522be58bf1ddafe45e313b98d582a43bf99/association.nf#L493

There is nothing defined in the process you are referring to: https://github.com/shendurelab/MPRAflow/blob/fb359522be58bf1ddafe45e313b98d582a43bf99/association.nf#L413-L438

So two options.

  1. You look into the work directory for your file
  2. you add the following line to the process and rerun your workflow: publishDir "${params.outdir}/${params.name}", mode:'copy'
visze commented 2 years ago

The feather file is a portable file format for storing Arrow tables or data frames (from languages like Python or R). It is basically generated to load data quickly into R or python for further analysis (without parsing a csv or tsv file). There a more feather files created but others are not copied because not defined in the workflow. Here is the script part creating them:

https://github.com/shendurelab/MPRAflow/blob/fb359522be58bf1ddafe45e313b98d582a43bf99/src/nf_ori_map_barcodes.py#L154-L171

This particular file is basically a subset of the pickle file (removing redundant barcodes and the most common 50 barcodes). So a really strict filter (especially because of the non-redundancy).

Minb88 commented 2 years ago

Thank you very much for clarifying