sjroth / ARTDeco

MIT License
15 stars 7 forks source link

IndexError: list index out of range #19

Closed ZiggeyQi closed 11 months ago

ZiggeyQi commented 1 year ago

Hi, i'm really a freshman of the bioinformatics package using, and sorry for my problems about preparing the environment of ARTDeco. First, i try to install the ARTDeco followed the instruction of README, i want to install the required packages by 'conda env create -f environment.yml', but failed, and just paused at the step of "solving envirenment", maybe there is some problems about my internet or some issues i can not figure it out. So, i create a conda envirenment named ARTDeco, and install the required package manually, all the packages meet the requirement except the “samtools”, because my samtools is higher than version 1.9. The ARTDeco envirenment loaded python=3.8, R=4.2.0, BiocManager=3.15, DESeq2 is corresponding with BiocManager=3.15. Then i prepared all the stuff and run the following code, after running for a few minutes i got the error message as follow:

code

bin_ARTDceo='/public3/home/scg7144/anaconda3/envs/ARTDeco/bin/ARTDeco' ref_gtf='/public3/home/scg7144/all_index/GRCh38.p14_genomic.gtf/GCF_000001405.40_GRCh38.p14_genomic.gtf' chromsize='/public3/home/scg7144/all_index/hg38.p14.chrom.esize/hg38.p14.genome.chrom.sizes' meta_file='/public3/home/scg7144/HSV_RNA_Seq/read_in_through/group_define.txt' comparation='/public3/home/scg7144/HSV_RNA_Seq/read_in_through/comparation.txt' work_dir='/public3/home/scg7144/HSV_RNA_Seq/read_in_through/' hostBamfile='/public3/home/scg7144/HSV_RNA_Seq/align/host_Bam/' cd /public3/home/scg7144/HSV_RNA_Seq/read_in_through/ source /public3/home/scg7144/anaconda3/bin/activate ARTDeco ARTDeco -home-dir $work_dir -bam-files-dir $hostBamfile -gtf-file $ref_gtf -cpu 12 -chrom-sizes-file $chromsize -meta-file $meta_file -comparisons-file $comparation

error message

No valid run mode specified... Will generate all files... Loading ARTDeco file structure... Reformatted meta file exists... Reformatted comparisons file exists... ARTDeco will generate the following files: /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/Mut_12h-Mock-read_in_assignment.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-3_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/readthrough/read_in_assignments.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-2_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/readthrough /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/WT_12h-Mut_12h-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-6h-3_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-1_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-12h-3_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-2_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-2_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-3_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/quantification/gene.exp.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mut-12h-1_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_12h-Mock-read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-1_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-2_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/quantification /public3/home/scg7144/HSV_RNA_Seq/read_in_through/readthrough/readthrough.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-2_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-3_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-3_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-2_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/all_dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_6h-Mock-read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/Mut_12h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-2_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/quantification/max_isoform.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs /public3/home/scg7144/HSV_RNA_Seq/read_in_through/quantification/read_in.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/all_dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/read_in.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/WT_6h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-3_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-1_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_12h-Mut_12h-read_in_assignment.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_6h-Mut_6h-read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/Mut_6h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-1_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-2_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-1_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-3_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-3_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-3_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs/WT_12h-Mut_12h-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/readthrough.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-1_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/Mut_6h-Mock-read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-2_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-3_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-1_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-12h-1_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-6h-2_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs/Mut_12h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-3_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/readthrough/read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp /public3/home/scg7144/HSV_RNA_Seq/read_in_through/quantification/gene.exp.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs/Mut_6h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/readthrough/corrected_exp.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-1_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs/WT_12h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mock-2_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-3_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mock-1_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mut-6h-2_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-3_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-1_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_12h-Mock-read_in_assignment.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-3_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-12h-2_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-2_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mock-3_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs/WT_6h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-6h-1_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-1_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-1_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/WT_6h-Mut_6h-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mut-6h-1_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-12h-2_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_12h-Mut_12h-read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mut-12h-2_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/WT_12h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-1_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mut-6h-3_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs/WT_6h-Mut_6h-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/Mut_6h-Mock-read_in_assignment.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-2_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/Mut_12h-Mock-read_in.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/Mut-12h-3_host /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-3_host.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-1_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-3_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-2_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/WT-6h-1_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_6h-Mock-read_in_assignment.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-1_host.dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_dogs /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/all_dogs.bed /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mock-2_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/quantification/readthrough.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-6h-2_host.dogs.fpkm.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp_read_in/WT_6h-Mut_6h-read_in_assignment.txt GTF file needed... Checking... GTF file exists... BAM file format needed... Checking... Will infer if not user-specified. No layout specified... Will infer... No strandedness specified... Will infer... No strand orientation specified... Will infer... Will infer BAM formats... Full genes BED file exists... Inferring BAM file formats... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/misc.py", line 299, in infer_experiment pos = float(out[2].split(':')[1]) IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public3/home/scg7144/anaconda3/envs/ARTDeco/bin/ARTDeco", line 33, in sys.exit(load_entry_point('ARTDeco==0.4', 'console_scripts', 'ARTDeco')()) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/main.py", line 326, in main File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/misc.py", line 317, in infer_experiments_group File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value IndexError: list index out of range

error message end

i try to fix the "IndexError: list index out of range", but failed, would you like to give me some advise about preparing the ARTDeco envirenment or how to fix or aviod this error. hoping for your reply. best regards.

sjroth commented 1 year ago

Hi,

I would not fixate on the conda environment. The newer versions of conda are very finicky with R. I will probably create a docker container to solve this problem.

As for your run issue, is your GTF file properly formatted?

Best, Sam

ZiggeyQi commented 1 year ago

thanks for your response, i download my GFT file from NCBI. i‘m not sure if my GTF file format is absolutely correct, but i can use the same GTF file for other packages , ie, deeptools for computeMatrix function. i paste the GTF content showed by “less” and “head” function, as follow:

less

gtf-version 2.2

!genome-build GRCh38.p14

!genome-build-accession NCBI_Assembly:GCF_000001405.40

!annotation-source NCBI Homo sapiens Annotation Release 110

NC_000001.11 BestRefSeq gene 11874 14409 . + . gene_id "DDX11L1"; transcript_id ""; db_xref "GeneID:100287102"; db_xref "HGNC:HGNC:37102"; description "DEAD/H-box helicase 11 like 1 (pseudogene)"; gbkey "Gene"; gene "DDX11L1"; gene_biotype "transcribed_pseudogene"; pseudo "true"; NC_000001.11 BestRefSeq transcript 11874 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; NC_000001.11 BestRefSeq exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "1"; NC_000001.11 BestRefSeq exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "2"; NC_000001.11 BestRefSeq exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "3"; NC_000001.11 BestRefSeq gene 14362 29370 . - . gene_id "WASH7P"; transcript_id ""; db_xref "GeneID:653635"; db_xref "HGNC:HGNC:38034"; description "WASP family homolog 7, pseudogene"; gbkey "Gene"; gene "WASH7P"; gene_biotype "transcribed_pseudogene"; gene_synonym "FAM39F"; gene_synonym "WASH5P"; pseudo "true"; NC_000001.11 BestRefSeq transcript 14362 29370 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; db_xref "GeneID:653635"; gbkey "misc_RNA"; gene "WASH7P"; product "WASP family homolog 7, pseudogene"; pseudo "true"; transcript_biotype "transcript"; NC_000001.11 BestRefSeq exon 29321 29370 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; db_xref "GeneID:653635"; gene "WASH7P"; product "WASP family homolog 7, pseudogene"; pseudo "true"; transcript_biotype "transcript"; exon_number "1"; NC_000001.11 BestRefSeq exon 24738 24891 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; db_xref "GeneID:653635"; gene "WASH7P"; product "WASP family homolog 7, pseudogene"; pseudo "true"; transcript_biotype "transcript"; exon_number "2"; NC_000001.11 BestRefSeq exon 18268 18366 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; d

head

gtf-version 2.2

!genome-build GRCh38.p14

!genome-build-accession NCBI_Assembly:GCF_000001405.40

!annotation-source NCBI Homo sapiens Annotation Release 110

NC_000001.11 BestRefSeq gene 11874 14409 . + . gene_id "DDX11L1"; transcript_id ""; db_xref "GeneID:100287102"; db_xref "HGNC:HGNC:37102"; description "DEAD/H-box helicase 11 like 1 (pseudogene)"; gbkey "Gene"; gene "DDX11L1"; gene_biotype "transcribed_pseudogene"; pseudo "true"; NC_000001.11 BestRefSeq transcript 11874 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; NC_000001.11 BestRefSeq exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "1"; NC_000001.11 BestRefSeq exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "2"; NC_000001.11 BestRefSeq exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "GeneID:100287102"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1 (pseudogene)"; pseudo "true"; transcript_biotype "transcript"; exon_number "3"; NC_000001.11 BestRefSeq gene 14362 29370 . - . gene_id "WASH7P"; transcript_id ""; db_xref "GeneID:653635"; db_xref "HGNC:HGNC:38034"; description "WASP family homolog 7, pseudogene"; gbkey "Gene"; gene "WASH7P"; gene_biotype "transcribed_pseudogene"; gene_synonym "FAM39F"; gene_synonym "WASH5P"; pseudo "true";

if available, please check my GFT file format correct or not, or any recommended distinct GTF file version or download link. thanks a lot.

ZiggeyQi commented 1 year ago

Hi, i replaced the GTF file as hg38.refGene.gtf downloaded from UCSC, but the same error message i got. I used Hisat2 to align my raw data with the reference genome downloaded from NCBI-hg38.p14, dose there any troubles about the align with relative new reference genome, or the align tools selection, should i align my raw data with the old version reference genome or use other tools to align my raw data? Did you test any human RNA-seq data, if so, can you tell me the details about the test, i wanna follow the step of yours to try my data, ie, the align tools, reference genome, the GTF file for ARTDeco. Thanks again.

sjroth commented 1 year ago

Hi, the first step that you need to do is completely read the documentation as well as the peer-reviewed citation that I wrote. In there, I describe how to check the format of your GTF. In my paper, you will find that the main data that was used for developing ARTDeco was human total RNA-seq data. There are several potential issues. Firstly, I would strongly recommend against any new reference genome as there may be scaffolds that are not accounted for in the GTF. You need to check that the chromosome names line up. When originally developing ARTDeco, I used the Ensembl gene annotation (and modified it as described in the README) with hg38.

I can attempt to debug your current problems as well. I need you to delete all files but your BAM files and re-run the script. Then, send me one BAM file and your genes.full.bed file.

ZiggeyQi commented 1 year ago

so sorry for my overlooking of the paper details, and thanks for your kind help, i will try it again following your steps. if i still can not deal with the issue, i‘ll seek your help soon. thanks again, best regards.

ZiggeyQi commented 1 year ago

Hi, i align my trimed raw data with GENECODE-GRCh38-release-21 by STAR with default parameter. I use GENECODE-GRCh38.fa and GENECODE-GRCh38.gtf-V28 to generate the index by STAR, also the GTF file GENECODE-GRCh38.gtf-V28 was used for align program. Code as below:

# STAR align code $bin_STAR --runThreadN 20 --genomeDir $homo_index --readFilesCommand gunzip -c --outSAMtype BAM SortedByCoordinate --sjdbGTFfile $ref_gtf --outFileNamePrefix /public3/home/scg7144/HSV_RNA_Seq/align/${sample} --outReadsUnmapped /public3/home/scg7144/HSV_RNA_Seq/align/${file}.HSV --quantMode TranscriptomeSAM GeneCounts --sjdbOverhang 149 --readFilesIn $t_fq1 $t_fq2

when i got the bam file, i used it for ARTDeco program, before that, i use the code "samtools faidx genome.fa\cut -f1,2 genome.fa.fai > genome.chrom.sizes" to get the GENECODE-GRCh38.fa chromsize file and the code "awk '{ if ($0 ~ "transcript_id") print $0; else print $0" transcript_id \"\";"; }' genes.gtf > modified_genes.gtf" for the modified-GENECODE-GRCh38.gtf-V28 GTF file, indeed, i use "gtf2bed < genes.gtf" to chick the modified GTF file, no error message i got. However, i sitll get the error message as before. Can you test my bam file for ARTDeco analysis, i really appreciate your help. As your requirement, i uploaded one bam and genes.full.bed file on google drive, you can access the file by the following link: https://drive.google.com/drive/folders/1l8atrNibBeS9yiBIuxfRwHF-7CbzY8t8?usp=sharing

# ARTDeco code bin_ARTDceo='/public3/home/scg7144/anaconda3/envs/ARTDeco/bin/ARTDeco' ref_gtf='/public3/home/scg7144/all_index/homo_genome_GTF/GTF/modified_genecode.v28.chr_patch_hapl_scaff.annotation.gtf' chromsize='/public3/home/scg7144/all_index/homo_genome_GTF/genome/genecode_GRCh38.genome.chrom.sizes' meta_file='/public3/home/scg7144/HSV_RNA_Seq/read_in_through/group_define.txt' comparation='/public3/home/scg7144/HSV_RNA_Seq/read_in_through/comparation.txt' work_dir='/public3/home/scg7144/HSV_RNA_Seq/read_in_through/' hostBamfile='/public3/home/scg7144/HSV_RNA_Seq/align/STAR_align/bamfile/transcriptome_bam/' cd /public3/home/scg7144/HSV_RNA_Seq/read_in_through/ source /public3/home/scg7144/anaconda3/bin/activate ARTDeco ARTDeco -home-dir $work_dir -bam-files-dir $hostBamfile -gtf-file $ref_gtf -cpu 8 -chrom-sizes-file $chromsize -meta-file $meta_file -comparisons-file $comparation

# error message No valid run mode specified... Will generate all files... Loading ARTDeco file structure... Meta file properly formatted... Generating reformatted meta... Comparisons file exists... Comparisons file properly formatted... Generating reformatted comparisons... ARTDeco will generate the following files: /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/Mut-12h-2Aligned.toTranscriptome.out.dogs.raw.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/preprocess_files/WT-6h-1Aligned.toTranscriptome.out /public3/home/scg7144/HSV_RNA_Seq/read_in_through/diff_exp/Mut_6h-Mock-results.txt /public3/home/scg7144/HSV_RNA_Seq/read_in_through/dogs/all_dogs.raw.txt .................................................. GTF file needed... Checking... GTF file exists... BAM file format needed... Checking... Will infer if not user-specified. No layout specified... Will infer... No strandedness specified... Will infer... No strand orientation specified... Will infer... Will infer BAM formats... Generating full genes BED file... Convert GTF to BED... Generating condensed genes bed... Inferring BAM file formats... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/misc.py", line 299, in infer_experiment pos = float(out[2].split(':')[1]) IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public3/home/scg7144/anaconda3/envs/ARTDeco/bin/ARTDeco", line 33, in sys.exit(load_entry_point('ARTDeco==0.4', 'console_scripts', 'ARTDeco')()) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/main.py", line 326, in main File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/misc.py", line 317, in infer_experiments_group File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value IndexError: list index out of range

sjroth commented 1 year ago

Hi,

I will look into this. As a request, please try to keep queries brief. It makes it easier to address issues.

Sam

sjroth commented 1 year ago

Hi,

I cannot reproduce your error so I am unsure what the issue is. I did note when inspecting your files that, against my suggestion, you are using annotations that include scaffolds rather than sticking to canonical chromosomes. The best guess that I have is that there is an issue with one or more of your BAM files. The error is saying that RSeQC strandedness inference is not working. Do you know the format of your data? Is there a reason you need to infer the format of your data?

Best, Sam

ZiggeyQi commented 1 year ago

Hi, I don't known how to chick my BAM file, my data is a polyA-RNAseq, and non-stranded. I used the parameter “-layout PE -stranded False”, then the "IndexError: list index out of range" disappeared, but i got the new error message as below, albeit the GTF file, which contains the comprehensive gene annotation on the reference chromosomes only (downloaded from GENECODE), was used to index generation, align and ARTDeco (just preprocess mode).

error

Convert GTF to BED... Generating condensed genes bed... Generating read-in region BED file... Traceback (most recent call last): File "/public3/home/scg7144/anaconda3/envs/ARTDeco/bin/ARTDeco", line 33, in sys.exit(load_entry_point('ARTDeco==0.4', 'console_scripts', 'ARTDeco')()) File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/main.py", line 426, in main File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/ARTDeco-0.4-py3.8.egg/ARTDeco/preprocess.py", line 349, in create_unstranded_read_in_df File "/public3/home/scg7144/anaconda3/envs/ARTDeco/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'append'

Best, Hansong

sjroth commented 1 year ago

That is an issue with your version of pandas. Use version 0.24 to be safe as the append function was deprecated at some point (I don't remember the last version). I may end up writing a docker container to avoid these versioning errors.

The error you were getting earlier was due to RSeQC not parsing your BAM file correctly. As a test, you can run the following code on each BAM file:

infer_experiment.py -r genes.full.bed -i BAM_file

If you give me the output of that, then we can go from there.

This difficulty in usage is not common, so I will again offer a Zoom call to iron out these issues rather than exchanging GitHub comments. As an alternative, I will again recommend that you use the genome versions in the ARTDeco paper with only canonical chromosomes.

ZiggeyQi commented 1 year ago

Hi, God, I really really appreciate your help, and happy to told you, every thing works now. I guess the error "IndexError: list index out of range" resulting from the missed parameter “-layout PE -stranded False” and the GTF file (it should be containing the canonical chromosomes only), because the ARTDeco prefers to analysis stranded RNAseq as default. I used the correct formated GTF file to index generation, align, and run the ARTDeco on the prepared envirement same as README, ultimately, i did it. Also, i run the code "infer_experiment.py -r genes.full.bed -i BAM_file", the output as below: Traceback (most recent call last): File "/public3/home/scg7144/micromamba/envs/ARTDecooo/bin/infer_experiment.py", line 53, in <module> from qcmodule import SAM File "/public3/home/scg7144/micromamba/envs/ARTDecooo/lib/python3.6/site-packages/qcmodule/SAM.py", line 22, in <module> import pysam File "/public3/home/scg7144/micromamba/envs/ARTDecooo/lib/python3.6/site-packages/pysam/__init__.py", line 5, in <module> from pysam.libchtslib import * ImportError: libhts.so.2: cannot open shared object file: No such file or directory

I recommond new users choose mamba to prepare the ARTDeco envirenment, it perfroms well on dependence conflict and installation time saving. I just use the code "mamba env create -f environment.yml" to get everything, this might avoid the potential errors due to the required package version conflict.

thanks again Sam Best, Hansong

sjroth commented 11 months ago

I am glad you got it to work. I will be writing a Docker container to encapsulate operations as I cannot maintain ARTDeco as often as I could as PhD student. I will also be doing a few updates for known bugs. I plan to do this by ~September.