Closed antoine4ucsd closed 4 months ago
The major problem here is the mapping step. Example code is for tair10, you need to change it. For this step,
data
directory. Not the per-chromosome fasta, the whole genome fasta, unless you only want to align to a single chromosome. workflow-hisat2/workflow_hisat2-pe.yml
replace the tair reference path to your new ref genome path, like data/hg38.fasta
. The index hg38.fasta.fai
file should also be placed in the data
.create_db
step, similarly, download hg38.gff
and replace the file
, dataSource
, and organism
Reading this page will help you to understand how the yml
file is working in a workflow step: https://systempipe.org/sp/spr/cwl/cwl_and_spr/
@tgirke Would you mind giving additional guidance if any?
thank you for taking the time to get back to me.
closed
Hello thank you for providing such an impressive platform I am new with RNAseq data processing. I have a dataset of 80 samples that need to be processed, mapped to human genome, and anlayses for DE between groups. I wanted to start witih a couple of samples but I encounter a few issue/limitations. I will try to give you details below
My data are saved in a specifc folder so I created my now targetPE file targetsPE.example.txt
Next I start preparing the workflow
Read preprocessing
trimmomatic : while the script work with trimmomatic , I am trying without it first
quality report
this step is working. definitely not good output format when many format. would be better to save by sample IMO.
Alignments
Read mapping with
HISAT2
Here, HiSAT2 can work without error but I would need some guidance to align to hg. any chance you have a template? I do have each chromosome in fasta file and I can also download indexes here https://daehwankimlab.github.io/hisat2/download/
but my understanding is that it aligns to the ref in the default data folder (tair10.fasta)?
so my resulting alignemnt is as follow (~1% align)
FileName Nreads2x Nalign Perc_Aligned Nalign_Primary Perc_Aligned_Primary 1 D8_C1 67737666 7154 0.010561 7154 0.010561 2 D8_CPLUSI1 45082378 5210 0.011557 5210 0.011557
and after changing the ref genome , I would also need some guidance to adapt the cmdline below
I know this is a lot but I was hoping other may have the same kind of data to process....