nf-core / pangenome

Renders a collection of sequences into a pangenome graph.
https://nf-co.re/pangenome
MIT License
56 stars 15 forks source link

Won't accept sample sheet.csv #117

Closed SarahBeecroft closed 1 year ago

SarahBeecroft commented 1 year ago

Check Documentation

I have checked the following places for your error:

Description of the bug

No matter what format I put inside my samplesheet.csv, I get the following error

ERROR: Validation of pipeline parameters failed!

* --input: string [samplesheet.csv] does not match pattern ^\S+\.fn?a(sta)?(\.gz)?$ (samplesheet.csv)

I have copied in the exact format from the usage.md, as well as from the python script that parses the spreadsheet. It still fails if I create dummy files which match the names in the sample sheet.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:

    nextflow run nf-core/pangenome -r a_brave_new_world -profile singularity --input samplesheet.csv --n_haplotypes 12 --wfmash_map_pct_id 70 --wfmash_segment_length 2000 --smoothxg_poa_length '700,900,1100' --outdir outdir
  2. See error:

    
    ERROR: Validation of pipeline parameters failed!

Expected behaviour

I would expect it to read my sample sheet, since it should follow the expected format.

Log files

Mar-24 12:20:11.820 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-core/pangenome -r a_brave_new_world -profile singularity --input samplesheet.csv --n_haplotypes 12 --wfmash_map_pct_id 70 --wfmash_segment_length 2000 --smoothxg_poa_length 700,900,1100 --outdir outdir
Mar-24 12:20:11.989 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 22.10.6
Mar-24 12:20:12.020 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/scratch/pawsey0012/sbeecroft/pangenome/plugins; core-plugins: nf-amazon@1.11.3,nf-azure@0.14.2,nf-codecommit@0.1.2,nf-console@1.0.4,nf-ga4gh@1.0.4,nf-google@1.4.5,nf-tower@1.5.6,nf-wave@0.5.3
Mar-24 12:20:12.036 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Mar-24 12:20:12.037 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Mar-24 12:20:12.045 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Mar-24 12:20:12.070 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Mar-24 12:20:12.091 [main] DEBUG nextflow.scm.ProviderConfig - Using SCM config path: /scratch/pawsey0012/sbeecroft/pangenome/scm
Mar-24 12:20:13.450 [main] DEBUG nextflow.scm.AssetManager - Git config: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/pangenome.git
Mar-24 12:20:13.469 [main] DEBUG nextflow.scm.RepositoryFactory - Found Git repository result: [RepositoryFactory]
Mar-24 12:20:13.478 [main] DEBUG nextflow.scm.AssetManager - Git config: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/pangenome.git
Mar-24 12:20:15.132 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/nextflow.config
Mar-24 12:20:15.132 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /scratch/pawsey0012/sbeecroft/pangenome/nextflow.config
Mar-24 12:20:15.133 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/nextflow.config
Mar-24 12:20:15.134 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /scratch/pawsey0012/sbeecroft/pangenome/nextflow.config
Mar-24 12:20:15.153 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `singularity`
Mar-24 12:20:16.437 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `singularity`
Mar-24 12:20:16.849 [main] DEBUG nextflow.config.ConfigBuilder - Available config profiles: [cfc_dev, ifb_core, denbi_qbic, genotoul, alice, mjolnir_globe, uppmax, abims, janelia, nihbiowulf, nu_genomics, oist, sahmri, mpcdf, leicester, lugh, vsc_ugent, sage, cambridge, unibe_ibu, vai, podman, czbiohub_aws, jax, cheaha, xanadu, ccga_med, test, scw, tigem, google, computerome, ipop_up, seg_globe, sanger, dkfz, pasteur, test_full, eddie, medair, azurebatch, bi, hki, bigpurple, sbc_sharc, adcra, crukmi, cedars, docker, engaging, gis, psmn, eva, ucl_myriad, utd_ganymede, charliecloud, fgcz, conda, crg, singularity, icr_davros, ceres, munin, arm, rosalind, prince, hasta, cfc, utd_sysbio, uzh, debug, genouest, cbe, ebc, ku_sund_dangpu, ccga_dx, crick, marvin, phoenix, gitpod, biohpc_gen, seawulf, shifter, mana, mamba, wehi, awsbatch, uct_hpc, imperial, maestro, aws_tower, binac]
Mar-24 12:20:16.896 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
Mar-24 12:20:16.897 [main] INFO  nextflow.cli.CmdRun - Launching `https://github.com/nf-core/pangenome` [cheesy_stonebraker] DSL2 - revision: 2a17b1c088 [a_brave_new_world]
Mar-24 12:20:16.897 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Mar-24 12:20:16.897 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
Mar-24 12:20:16.902 [main] DEBUG nextflow.secret.LocalSecretsProvider - Secrets store: /scratch/pawsey0012/sbeecroft/pangenome/secrets/store.json
Mar-24 12:20:16.906 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@59fea5f5] - activable => nextflow.secret.LocalSecretsProvider@59fea5f5
Mar-24 12:20:16.975 [main] DEBUG nextflow.Session - Session UUID: ec5c62f7-f7e1-4fec-849b-a400dd875127
Mar-24 12:20:16.975 [main] DEBUG nextflow.Session - Run name: cheesy_stonebraker
Mar-24 12:20:16.983 [main] DEBUG nextflow.Session - Executor pool size: 256
Mar-24 12:20:16.995 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=768; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Mar-24 12:20:17.045 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 22.10.6 build 5843
  Created: 23-01-2023 23:20 UTC (24-01-2023 07:20 AWDT)
  System: Linux 5.3.18-150300.59.87_11.0.78-cray_shasta_c
  Runtime: Groovy 3.0.13 on OpenJDK 64-Bit Server VM 11.0.13+7-b1751.21
  Encoding: UTF-8 (UTF-8)
  Process: 47930@setonix-01 [146.118.12.22]
  CPUs: 256 - Mem: 251.2 GB (192.2 GB) - Swap: 9.3 GB (8.5 GB)
Mar-24 12:20:17.084 [main] DEBUG nextflow.Session - Work-dir: /scratch/pawsey0012/sbeecroft/pangenome/work [lustre]
Mar-24 12:20:17.084 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/bin
Mar-24 12:20:17.102 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Mar-24 12:20:17.117 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Mar-24 12:20:17.248 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Mar-24 12:20:17.270 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 257; maxThreads: 1000
Mar-24 12:20:17.445 [main] DEBUG nextflow.Session - Session start
Mar-24 12:20:17.452 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow started -- trace file: /scratch/pawsey0012/sbeecroft/pangenome/outdir/pipeline_info/execution_trace_2023-03-24_12-20-16.txt
Mar-24 12:20:17.480 [main] DEBUG nextflow.Session - Using default localLib path: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/lib
Mar-24 12:20:17.487 [main] DEBUG nextflow.Session - Adding to the classpath library: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/lib
Mar-24 12:20:17.488 [main] DEBUG nextflow.Session - Adding to the classpath library: /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/lib/nfcore_external_java_deps.jar
Mar-24 12:20:18.709 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-24 12:20:18.785 [main] INFO  nextflow.Nextflow - 

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/pangenome v1.0dev-g2a17b1c
------------------------------------------------------
Core Nextflow options
  revision             : a_brave_new_world
  runName              : cheesy_stonebraker
  containerEngine      : singularity
  launchDir            : /scratch/pawsey0012/sbeecroft/pangenome
  workDir              : /scratch/pawsey0012/sbeecroft/pangenome/work
  projectDir           : /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome
  userName             : sbeecroft
  profile              : singularity
  configFiles          : /scratch/pawsey0012/sbeecroft/pangenome/assets/nf-core/pangenome/nextflow.config, /scratch/pawsey0012/sbeecroft/pangenome/nextflow.config

Input/output options
  input                : samplesheet.csv
  n_haplotypes         : 12
  outdir               : outdir

Wfmash Options
  wfmash_map_pct_id    : 70
  wfmash_segment_length: 2000
  wfmash_block_length  : null
  wfmash_sparse_map    : null
  wfmash_exclude_delim : null
  wfmash_temp_dir      : null

Seqwish Options
  seqwish_temp_dir     : null

Smoothxg options
  smoothxg_block_id_min: null
  smoothxg_temp_dir    : null

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/pangenome for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/pangenome/blob/master/CITATIONS.md
------------------------------------------------------
Mar-24 12:20:19.040 [main] ERROR nextflow.Nextflow - ERROR: Validation of pipeline parameters failed!
Mar-24 12:20:19.052 [main] ERROR nextflow.Nextflow - * --input: string [samplesheet.csv] does not match pattern ^\S+\.fn?a(sta)?(\.gz)?$ (samplesheet.csv)

System

Nextflow Installation

Container engine

Thanks!!

subwaystation commented 1 year ago

Hi, Sorry for coming back so late. I must have missed this in my mails.

The pipeline does not accept a sample sheet, because its input is only a plain FASTA file.

Please try

nextflow run nf-core/pangenome -r a_brave_new_world -profile singularity --input FASTA_INPUT.fa.gz --n_haplotypes 12 --wfmash_map_pct_id 70 --wfmash_segment_length 2000 --smoothxg_poa_length '700,900,1100' --outdir outdir

The FASTA file has to be zipped with bgzip beforehand.

SarahBeecroft commented 1 year ago

No worries! That's working for me now, thanks. I think I got confused because there's mention of the sample sheet in the readme currently. Nice when it's a simple fix! Thanks :)

mictadlo commented 4 months ago

I have five genomes. Do I merge them and compress them bgzip?

subwaystation commented 4 months ago

You put all sequences of all the five genomes into one FASTA file. Ideally, you rename the names of the sequences to respect the https://github.com/pangenome/PanSN-spec.