moiexpositoalonsolab / grenepipe

A flexible, scalable, and reproducible pipeline to automate variant calling from raw sequence reads, with lots of bells and whistles.
http://grene-net.org
GNU General Public License v3.0
93 stars 21 forks source link

config file #42

Closed ospfsg closed 7 months ago

ospfsg commented 7 months ago

I am trying to run grenepipe for the first time.

I run the following command from the directory where grenepipe is installed:

(grenepipe) dau1@frey:~/software/grenepipe-0.12.2$ snakemake -nq --cores 60 --use-conda --directory /mnt/data1/Project_KeePace/Operational/4_data_analysis/5_grenepipe/run1/ --conda-prefix /home/dau1/software/conda-envs/

but I get this error message:

WorkflowError in line 28 of /home/dau1/software/grenepipe-0.12.2/rules/common.smk: Config file is not valid JSON or YAML. In case of YAML, make sure to not mix whitespace and tab indentation. File "/home/dau1/software/grenepipe-0.12.2/Snakefile", line 7, in File "/home/dau1/software/grenepipe-0.12.2/rules/common.smk", line 28, in

inside run1 directory I have the config.yaml and samples.tsv (generated with generate-table.py)

The config file is a minimal modification of the template file edited with sublime in linux/ubuntu.

Any suggestion?

lczech commented 7 months ago

Hi @ospfsg,

the error states that the config file is not valid, so the minimal edits seem to have broken something. Check that you do not mix tabs and spaces, for instance, and that everything is indented as in the original file.

Can you post the file here, so that I can have a look?

Cheers Lucas

lczech commented 7 months ago

Also, in reply to your emails, I'll answer here, as this might be interesting for others as well:

The --conda-prefix option:

As far as I understand, is just a folder (always the same) where to where snakemake download the environments, but not relevant or useful for the analysis...

Yes, this directory can be specified to avoid re-downloading all the packages from conda every time you run the pipeline. By using the same directory for every analysis that you run, you will save time. Other than that, it does not change the analysis.

I find the pipeline very useful and if it works I as expect I would like to ask to be installed in the Portuguese Network of advanced Computing (https://rnca.fccn.pt/)

Thank you, happy to hear! As for installing: The things that need to be installed are conda and Snakemake, as described in the grenepipe wiki. Then, grenepipe itself just needs to be downloaded to somewhere to be used, but no installation step per-se is necessary.

for instance for fastp I am using this extra params below. Is this the correct syntax?

# ----------------------------------------------------------------------
  #     fastp
  # ----------------------------------------------------------------------

  # Used only if settings:trimming-tool == fastp
  # See fastp manual: https://github.com/OpenGene/fastp
  fastp:
    threads: 100

    # Extra parameters for single reads.
    se: ""

    # Extra parameters for paired end reads.
    pe: "-g -l 80 -p -q 20"

It generally looks correct, but your email has messed up the indentation of the first line. Not sure if that is in your file as well. YAML is sensitive to that (similar to Python), and so the indentation needs to be kept as-is, and no tabs and spaces can be mixed.

I hope I replied to all your questions from the emails. If not, please let me know here :-)

Cheers Lucas

lczech commented 7 months ago

Hi @ospfsg,

in reply to your email:

Dear Lucas

I build a new config.yaml file. I attached it. Again minimal editing.

the same results:

(grenepipe) dau1@frey:~/software/grenepipe-0.12.2$ snakemake -nq  --use-conda --conda-frontend mamba --> cores  60  --directory /mnt/data1/Project_KeePace/Operational/4_data_analysis/5_grenepipe/run1/  --conda-> prefix /home/dau1/software/conda-envs/

WorkflowError in line 28 of /home/dau1/software/grenepipe-0.12.2/rules/common.smk:
Config file is not valid JSON or YAML. In case of YAML, make sure to not mix whitespace and tab > indentation.
File "/home/dau1/software/grenepipe-0.12.2/Snakefile", line 7, in <module>
File "/home/dau1/software/grenepipe-0.12.2/rules/common.smk", line 28, in <module>

The config file is in RUN1 directory

snakemake is run from the grenepipe directory

Best

osp

with this attached config file: config.zip

The issue is that on line 7, you added some path to a directory:

data: "/mnt/data1/Project_KeePace/Operational/1_raw_data/"

which is not meant to be there. Only edit numbers, texts, and paths where the original file has them, otherwise you'll get an error. The data: line in the yaml file is the start of the group of options related to data, similar to the settings: on line 46. It is not meant to have any string following.

I assume that the directory that you are giving there is where your fastq files are? The locations of those files are specified in the samples table instead - no need to add any directory path for them.

Hope that helps Lucas

lczech commented 7 months ago

PS: You can also try tools such as https://www.yamllint.com/ to verify your file beforehand :-) In your case, it fails to give the exact cause of the error, but at least tells you whether the file is valid.

lczech commented 7 months ago

Hi @ospfsg,

from your recent email, I understand that you got it work now, great! I'm going to close this issue then. For any further questions and problems, simply open a new one then. GitHub Issues are the preferred way for questions, rather than email.

Thank you and so long Lucas