nf-core / crisprseq

A pipeline for the analysis of CRISPR edited data. It allows the evaluation of the quality of gene editing experiments using targeted next generation sequencing (NGS) data (`targeted`) as well as the discovery of important genes from knock-out or activation CRISPR-Cas9 screens using CRISPR pooled DNA (`screening`).
https://nf-co.re/crisprseq
MIT License
32 stars 26 forks source link

Mageck MLE Failes with non trival design matrix. #211

Open andrewholding opened 1 month ago

andrewholding commented 1 month ago

Description of the bug

Using the any design matrix with more then 3 samples there pipeline exists from Magick MLE with the following error.

Command error: INFO @ Wed, 16 Oct 2024 15:53:18: Parameters: /usr/local/bin/mageck mle --threads 6 -k count_table.count.txt -n designmatrix-anh004 -d designmatrix-anh004.txt INFO @ Wed, 16 Oct 2024 15:53:23: Cannot parse design matrix as a string; try to parse it as a file name ... INFO @ Wed, 16 Oct 2024 15:53:23: Design matrix: INFO @ Wed, 16 Oct 2024 15:53:23: [[1. 1. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.]] INFO @ Wed, 16 Oct 2024 15:53:23: Beta labels:baseline,common,hypoxiaVsNormoxia INFO @ Wed, 16 Oct 2024 15:53:23: Included samples:day0-1,day0-2,control1,control2,hypoxia1,hypoxia2 INFO @ Wed, 16 Oct 2024 15:53:23: Loaded samples:day0-1;day0-2;control1;control2;hypoxia1;hypoxia2 INFO @ Wed, 16 Oct 2024 15:53:23: Sample index: 4;5;2;3;0;1 INFO @ Wed, 16 Oct 2024 15:53:23: Loaded 182 genes. Error loading line 218 Error loading line 521 Error loading line 907 Traceback (most recent call last): File "/usr/local/bin/mageck", line 66, in main(); File "/usr/local/bin/mageck", line 43, in main args=crisprseq_parseargs(); File "/usr/local/lib/python3.9/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command File "/usr/local/lib/python3.9/site-packages/mageck/mlemageck.py", line 74, in mageckmle_main allgenedict=read_gene_from_file(args.count_table,includesamples=args.include_samples) File "/usr/local/lib/python3.9/site-packages/mageck/mleinstanceio.py", line 84, in read_gene_from_file ginst.nb_count=np.matrix(ginst.nb_count) File "/usr/local/lib/python3.9/site-packages/numpy/matrixlib/defmatrix.py", line 145, in new arr = N.array(data, dtype=dtype, copy=copy) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

The follow inputs were used:

Sample sheet sample,fastq_1,fastq_2,condition RM-231Hxn1,./seqdata231/RM-231Hxn1_R1_001.fastq.gz,,hypoxia1 RM-231Hxn2,./seqdata231/RM-231Hxn2_R1_001.fastq.gz,,hypoxia2 RM-231Nxn1,./seqdata231/RM-231Nxn1_R1_001.fastq.gz,,control1 RM-231Nxn2,./seqdata231/RM-231Nxn2_R1_001.fastq.gz,,control2 RM-231T0n1,./seqdata231/RM-231T0n1_R1_001.fastq.gz,,day0-1 RM-231T0n2,./seqdata231/RM-231T0n2_R1_001.fastq.gz,,day0-2

Guide library (head): id target transcript gene symbol 1 ACCAGGGGAGCCAAGTGGA ATP1A1 2 GAAGGAGCCCCGAACCCGG ATP1A1 3 ggcggacacgtggcaacag ATP1A1 4 GAGGGAGCGCAGTAACGGG ATP1A1 5 acagcggtagcagcccggg ATP1A1 6 CCAGCCCGTCTGGGACAGT ATP1A2 7 GGGCTGTGGGTCTAACTGT ATP1A2 8 AGGGAAGGACTAGAGATGT ATP1A2 9 AGCCCACACCAGCCCGTCT ATP1A2

Design Matrix: Samples baseline common hypoxiaVsNormoxia day0-1 1 1 0 day0-2 1 1 0 control1 1 0 0 control2 1 0 0 hypoxia1 1 1 1 hypoxia2 1 1 1

Command used and terminal output

nextflow run nf-core/crisprseq --analysis screening --input $sampleSheet -profile apptainer\ --library $guideLibrary --outdir $output \ --mle_design_matrix $designMatrix\ -w work-combined-day0

Relevant files

N E X T F L O W ~ version 23.10.0 Launching https://github.com/nf-core/crisprseq [loving_planck] DSL2 - revision: b2c583a3fd [master] WARN: Nextflow self-contained distribution allows only core plugins -- User config plugins will be ignored: nf-validation@1.1.3 WARN: Access to undefined parameter reference_fasta -- Initialise it to a default value eg. params.reference_fasta = some_value WARN: Access to undefined parameter monochromeLogs -- Initialise it to a default value eg. params.monochromeLogs = some_value


                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | | / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,' nf-core/crisprseq v2.2.1-gb2c583a

Core Nextflow options revision : master runName : loving_planck containerEngine : apptainer launchDir : /mnt/scratch/projects/biol-student-2023/NextFlowPipelineRM workDir : /mnt/scratch/projects/biol-student-2023/NextFlowPipelineRM/work-combined-day0 projectDir : /users/anh524/.nextflow/assets/nf-core/crisprseq userName : anh524 profile : apptainer configFiles :

Input/output options input : settings/samplesheet-anh003.csv outdir : anh231-mle-day0 analysis : screening

Screening parameters library : settings/guide_library-anh001.tsv mle_design_matrix: settings/designmatrix-anh004.txt

!! Only displaying parameters that differ from the pipeline defaults !!

If you use nf-core/crisprseq for your analysis please cite:

Caused by: Process NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX (designmatrix-anh004) terminated with an error exit status (1)

Command executed:

mageck \ mle \ \ --threads 6 \ -k count_table.count.txt \ -n designmatrix-anh004 \ -d designmatrix-anh004.txt

cat <<-END_VERSIONS > versions.yml "NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX": mageck: $(mageck -v) END_VERSIONS

Command exit status: 1

Command output: Error loading line 218 Error loading line 521 Error loading line 907

Command error: INFO @ Wed, 16 Oct 2024 15:53:18: Parameters: /usr/local/bin/mageck mle --threads 6 -k count_table.count.txt -n designmatrix-anh004 -d designmatrix-anh004.txt INFO @ Wed, 16 Oct 2024 15:53:23: Cannot parse design matrix as a string; try to parse it as a file name ... INFO @ Wed, 16 Oct 2024 15:53:23: Design matrix: INFO @ Wed, 16 Oct 2024 15:53:23: [[1. 1. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.] INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.]] INFO @ Wed, 16 Oct 2024 15:53:23: Beta labels:baseline,common,hypoxiaVsNormoxia INFO @ Wed, 16 Oct 2024 15:53:23: Included samples:day0-1,day0-2,control1,control2,hypoxia1,hypoxia2 INFO @ Wed, 16 Oct 2024 15:53:23: Loaded samples:day0-1;day0-2;control1;control2;hypoxia1;hypoxia2 INFO @ Wed, 16 Oct 2024 15:53:23: Sample index: 4;5;2;3;0;1 INFO @ Wed, 16 Oct 2024 15:53:23: Loaded 182 genes. Error loading line 218 Error loading line 521 Error loading line 907 Traceback (most recent call last): File "/usr/local/bin/mageck", line 66, in main(); File "/usr/local/bin/mageck", line 43, in main args=crisprseq_parseargs(); File "/usr/local/lib/python3.9/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command File "/usr/local/lib/python3.9/site-packages/mageck/mlemageck.py", line 74, in mageckmle_main allgenedict=read_gene_from_file(args.count_table,includesamples=args.include_samples) File "/usr/local/lib/python3.9/site-packages/mageck/mleinstanceio.py", line 84, in read_gene_from_file ginst.nb_count=np.matrix(ginst.nb_count) File "/usr/local/lib/python3.9/site-packages/numpy/matrixlib/defmatrix.py", line 145, in new arr = N.array(data, dtype=dtype, copy=copy) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

Work dir: /mnt/scratch/projects/biol-student-2023/NextFlowPipelineRM/work-combined-day0/20/ebfc47d70920ccc3aa3c2ea08f6da1

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details

System information

No response