nf-core / crisprseq

A pipeline for the analysis of CRISPR edited data. It allows the evaluation of the quality of gene editing experiments using targeted next generation sequencing (NGS) data (`targeted`) as well as the discovery of important genes from knock-out or activation CRISPR-Cas9 screens using CRISPR pooled DNA (`screening`).
https://nf-co.re/crisprseq
MIT License
25 stars 25 forks source link

Add --control-sgrna parameter for megeck count #63

Closed zhouzhendiao closed 9 months ago

zhouzhendiao commented 1 year ago

Description of feature

I have some non-target sgRNA in my library.

image

megeck count privide parameters --control-sgrna for generate the null distribution.

What does the --control-sgrna CONTROL_SGRNA option do? How to use this option? A: This option tells MAGeCK to use provided negative control sgRNAs to generate the null distribution when calculating the p values. If this option is not specified, MAGeCK generates the null distribution of RRA scores by assuming all of the genes in the library are non-essential. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential. By providing the corresponding sgRNA IDs in the --control-sgrna option, MAGeCK will have a better estimation of p values.

Can you kindly add these paramter, thanks!

LaurenceKuhl commented 1 year ago

Hi @zhouzhendiao ! This is already possible with a user.config profile :) could you please create a config file such as the following :

process { withName:MAGECK_MLE { ext.args = '--control-sgrna "your-config-file" ' } }

and then in the command line specify -c user.config

let me know how it goes :) best, Laurence

zhouzhendiao commented 1 year ago

Hi @LaurenceKuhl ,

I will try this later. Thanks for replying!

LaurenceKuhl commented 9 months ago

Hi i will close this issue, please feel free to re open if anything is unclear

jeremymsimon commented 1 month ago

Hi @LaurenceKuhl - my understanding is that these sorts of extra pipeline-specific parameters are best suited for the -params-file rather than the -c config.yml specification. Is it possible to implement something where we would specify the above as:

extra_mageck_mle_args: >-
  --control-sgrna "gRNA_CONTROL_IDs.tsv"

or similar within a supplied -params-file?

jeremymsimon commented 1 month ago

Note that when trying the above as

process { 
  withName: 'MAGECK_MLE' { 
    ext.args =  '--control-sgrna "gRNA_annotated_geneSymbol_withID_CONTROLS.tsv" ' 
  } 
}

where my .tsv contains a list of control gRNAs like

sgRNA56555
sgRNA56556
sgRNA56557
sgRNA56558
sgRNA56559
sgRNA56560
sgRNA56561
sgRNA56562
sgRNA56563
sgRNA56564

I get a FileNotFoundError:

Caused by:
  Process `NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE (treated_7hr_1,treated_7hr_2,treated_7hr_3_vs_control_1,control_2,control_3)` terminated with an error exit status (1)

Command executed:

  mageck \
      mle \
      --control-sgrna "gRNA_annotated_geneSymbol_withID_CONTROLS.tsv"  \
      --threads 6 \
      -k count_table.count.txt \
      -n treated_7hr_1,treated_7hr_2,treated_7hr_3_vs_control_1,control_2,control_3     \
      -d treated_7hr_1_treated_7hr_2_treated_7hr_3_vs_control_1_control_2_control_3.txt

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE":
      mageck: $(mageck -v)
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO  @ Wed, 24 Jul 2024 12:58:58: Parameters: /usr/local/bin/mageck mle --control-sgrna gRNA_annotated_geneSymbol_withID_CONTROLS.tsv --threads 6 -k count_table.count.txt -n treated_7hr_1,treated_7hr_2,treated_7hr_3_vs_control_1,control_2,control_3 -d treated_7hr_1_treated_7hr_2_treated_7hr_3_vs_control_1_control_2_control_3.txt
  INFO  @ Wed, 24 Jul 2024 12:58:59: Cannot parse design matrix as a string; try to parse it as a file name ...
  INFO  @ Wed, 24 Jul 2024 12:58:59: Design matrix:
  INFO  @ Wed, 24 Jul 2024 12:58:59: [[1. 0.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 0.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 0.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 1.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 1.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 1.]]
  INFO  @ Wed, 24 Jul 2024 12:58:59: Beta labels:baseline,treated_7hr_1_treated_7hr_2_treated_7hr_3_vs_control_1_control_2_control_3
  INFO  @ Wed, 24 Jul 2024 12:58:59: Included samples:control_1,control_2,control_3,treated_7hr_1,treated_7hr_2,treated_7hr_3
  INFO  @ Wed, 24 Jul 2024 12:59:00: Loaded samples:control_1;control_2;control_3;treated_7hr_1;treated_7hr_2;treated_7hr_3
  INFO  @ Wed, 24 Jul 2024 12:59:00: Sample index: 6;7;8;3;4;5
  INFO  @ Wed, 24 Jul 2024 12:59:00: Loaded 18899 genes.
  Traceback (most recent call last):
    File "/usr/local/bin/mageck", line 66, in <module>
      main();
    File "/usr/local/bin/mageck", line 43, in main
      args=crisprseq_parseargs();
    File "/usr/local/lib/python3.9/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs
      mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command
    File "/usr/local/lib/python3.9/site-packages/mageck/mlemageck.py", line 83, in mageckmle_main
      mageckcount_checkcontrolsgrna(args,sgrna2genelist)
    File "/usr/local/lib/python3.9/site-packages/mageck/mageckCount.py", line 457, in mageckcount_checkcontrolsgrna
      controlsglist=[line.strip() for line in open(args.control_sgrna)]
  FileNotFoundError: [Errno 2] No such file or directory: 'gRNA_annotated_geneSymbol_withID_CONTROLS.tsv'