sanger-tol / genomeassembly

Implementation of ToL genome assembly workflows
https://pipelines.tol.sanger.ac.uk/genomeassembly
MIT License
20 stars 2 forks source link

Input & help files #36

Open cintiaoi opened 6 months ago

cintiaoi commented 6 months ago

Description of feature

@muffato @gq1 @ksenia-krasheninnikova Hi, I want to run the pipeline with some insect genomes, I was wondering if there is any input or help files I can start with. Thanks

ksenia-krasheninnikova commented 6 months ago

Hi @cintiaoi Have you had a look here?

https://github.com/sanger-tol/genomeassembly/blob/main/docs/usage.md https://github.com/sanger-tol/genomeassembly/blob/main/docs/output.md

There are some example YAML files in the /assets folder in the repo.

cintiaoi commented 6 months ago

Hi @ksenia-krasheninnikova, thanks for your fast reply. I've checked assets/test.yaml and the other yaml files but they looked like full paths to an operational system we don't have access to. To be able to run, I was wondering if you can let me know how those files look like

ksenia-krasheninnikova commented 6 months ago

Have a look here:

https://darwin.cog.sanger.ac.uk/genomeassembly_test_data.tar.gz

This dataset corresponds to assets/test_github.yaml

gq1 commented 6 months ago

Here are the instructions how to do the test locally. https://github.com/sanger-tol/genomeassembly/blob/main/docs/usage.md#local-testing

cintiaoi commented 6 months ago

Thanks! So can we run the pipeline without 10x data, it looks that way in the main.nf. We have Pac bio and HiC data. Also, there is a mito.fam file which we are not exactly sure what it this is, is there an example? Thanks again

ksenia-krasheninnikova commented 6 months ago

If you keep polishing step switched off in the config file with polishing_on = false (like here) then the 10X data is not needed. You don't need .fam file to run the pipeline from the main branch now, this feature will be available in the next release.

Hope this helps!

cintiaoi commented 4 months ago

Just a follow up of the things I changed to run my own data.

juicer_tools_pre.nf I had to change the java version

merquryfk/main.nf process MERQURYFK_MERQURYFK { tag "$meta.id" label 'process_medium'

In the Nextflow config, I had to change the memory to run on my server:

including:

apptainer.registry = 'quay.io'

Using my genome I only had a SAM file, so I created a CRAM file. But still I had some errors, so a crai file was missing, which I also created using samtools. Now it worked.

I created a fork on my github account and the modified files are there. Thanks for your help!