Open junjunlab opened 4 weeks ago
Hi, I think it is the problem of the long.transcript.info.txt. The correct file should have the following columns:
$ less -S longest.transcripts.info.txt|sed -n "1p"|sed -s "s/\t/\n/g"|nl
1 chrom
2 trans_id
3 strand
4 gene_id
5 gene_name
6 transcript_biotype
7 gene_start
8 gene_stop
9 CDS_start
10 CDS_stop
11 CDS_length
12 5UTR_length
13 3UTR_length
14 transcript_length
If not, you can check out your reference file first and try it again.
best wishes
I re-created the longest.transcripts.info.txt and run MetageneAnalysis -f attributes.txt -c longest.transcripts.info.txt -o ./4e2_metagene, I got another error like following:
longest.transcripts.info.txt:
That's weird! The fetch of pysam is to get the mapped reads in the bam file. Unless there are no mapped reads in this transcript and this transcript id did not appear in the bam file. Otherwise, I cannot come up with any other problem that could cause this. Could you please send your bam file and attribute.txt file so I can repeat your problem?
Sure,this is my upstream code:
attribute.txt file:
This is my data:
通过网盘分享的文件:bam 链接: https://pan.baidu.com/s/1U3Zmc0gwFAM7PHrN9qANKA?pwd=dddv 提取码: dddv
Hi, I converted your bam into a fastq file and re-analyzed it with RiboMiner and I did not encounter any problems. Here is the command I used:
Ref=/home/00.Reference/human/ensemble110
transcript=/home/00.Reference/human/ensemble110/RiboCode_annot/transcripts_sequence.fa
results=$workdir/MA
attribute=$workdir/configure.txt
trans_info=$Ref/longest.transcripts.info.txt
groups='4E2'
replicates='4E2'
mkdir -p $results
MetageneAnalysis -f $attribute -c $trans_info -o $results/MA_normed -U codon -M counts -u 0 -d 500 -l 100 -n 10 -m 5 -e 5 --norm yes -y 100 --CI 0.95 --type CDS
and here is the running info:
your input: 1 bam files
19726 transcripts will be used in the following analysis.
Length filter(-l)---Transcripts number filtered by criterion one is : 782
Length filter (3n)---Transcripts number filtered by criterion two is : 120
Total counts filter---Transcripts number filtered by criterion three is : 18324
CDS density filter(RPKM-n or counts-n)---Transcripts number filtered by criterion four is : 453
CDS density filter(normed-m)---Transcripts number filtered by criterion five is : 0
Metaplots Transcript Number for bam file../08.STAR/4E2_ribo_STAR/4E2_ribo.Aligned.toTranscriptome.out.sorted.bam is :47
Finish the step of ribosomeDensityNormPerTrans
Finish the step of MetageneAnalysis!
findfont: Font family 'Arial' not found.
Finish the step of metagenePlot!
And I noticed that although the length distribution is OK, the periodicity is not good for this sample. And too many reads are mapped to intron or intergenic regions, indicating potential DNA contaminations.
# cutadapt
sample Total Trimmed(Percent) shortNum(Percentage) LeftNum(Percentage)
4E2 4,787,469 152,867 (3.2%) 164 (0.0%) 4,787,305 (100.0%)
# filtering
sample inputNum Remained discarded(Percent)
4E2 4787305 4693320 93985 (1%)
# remove rRNA contamination
sample ProcessedNum rRNA(Percent) noContamRNA(Percent)
4E2_ribo 4693320 44 (0.00%) 4693276 (100.00%)
# remove tRNA contamination
sample ProcessedNum tRNA(Percent) noContamRNA(Percent)
4E2_ribo_tRNA 4693276 0 (0.00%) 4693276 (100.00%)
# Star mapping
sample input UniquelyMapped(Percent) MutipulMapped(Percent)
4E2_ribo 4693276 4495285 (95.78%) 187047 (3.99%)
# DNA contamination
sample Exon DNA Intron ambiguous_RNA
4E2 195542 4049406 244464 5873
Anyway, I also checked out your original bam file and the transcript you mentioned was indeed not in the bam. Thus, is it possible that the bam and reference files do not match correctly? In this case, I suggest that you can re-do the mapping step and try it again.
best wishes
Thanks for your reply and test!I will have try aggin。
Hi, I perform MetageneAnalysis and use hg38.longest.transcripts.info.txt from Ribominer package. I got an error ValueError: invalid literal for int() with base 10: 'CDS_length':
I don't know what is the problem. Is there any way to solve it? Thanks!