rki-mf1 / clean

A nextflow pipeline for decontamination of short reads, long reads and contigs
BSD 3-Clause "New" or "Revised" License
30 stars 3 forks source link

Use human GRCh38 T2T genome as new option #57

Closed hoelzer closed 2 months ago

hoelzer commented 1 year ago

Ok, is already default :)

selmapichot commented 6 months ago

Hi, is it possible to use the recent T2T reference genome instead of the GRCh38 ? Many thanks

hoelzer commented 6 months ago

Hi, thanks for the hint. Indeed, it would be nice to provide this. Do you have a link where one could download the recent T2T human genome FASTA?

Besides, if you directly want to use it you can download the data yourself and provide it as an --own input to CLEAN

https://github.com/rki-mf1/clean/blob/711532eb50c059ac199bf386c84be9f409781fc5/clean.nf#L266

matthuska commented 5 months ago

I think the main NCBI page for that genome is here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_009914755.1/

They even provide a URL to download it, using curl, but I haven't tested it out yet:

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_009914755.1/download?include_annotation_type=GENOME_FASTA&include_annotation_type=GENOME_GFF&include_annotation_type=RNA_FASTA&include_annotation_type=CDS_FASTA&include_annotation_type=PROT_FASTA&include_annotation_type=SEQUENCE_REPORT&hydrated=FULLY_HYDRATED
hoelzer commented 2 months ago

Is added now in the latest release.