Closed MehdiM98 closed 10 months ago
Hi,
Thank you for your interest in the software. We recommend at least 10Gb of memory to run most analyses, though we prefer to allow up to 20Gb as a safety buffer. How long has it been running? If you could provide the command line and perhaps the log output of the tool, we might be able to see what stage it is at.
Thanks.
Hello Oliver, Thanks for the quick response. Here is where the software gets stuck:
I am using it in an HPC
INFO @ Thu, 29 Jun 2023 02:54:01:
# ARGUMENTS LIST:
# name = CHD3
# treatment files = ['HT1_AlignedSorted.bam', 'HT2_AlignedSorted.bam', 'HT3_AlignedSorted.bam']
# control files = ['WT1_AlignedSorted.bam', 'WT2_AlignedSorted.bam', 'WT3_AlignedSorted.bam']
# GTF file = mm10.refGene.gtf
# TE file = mm10_rmsk_TE.gtf
# multi-mapper mode = multi
# stranded = no
# differential analysis using DESeq2
# normalization = DESeq2_default
# FDR cutoff = 5.00e-02
# fold-change cutoff = 1.00
# read count cutoff = 1
# number of iteration = 100
# Alignments grouped by read ID = False
INFO @ Thu, 29 Jun 2023 02:54:01: Processing GTF files ...
INFO @ Thu, 29 Jun 2023 02:54:01: Building gene index ....... 100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
400000 GTF lines processed.
INFO @ Thu, 29 Jun 2023 02:55:10: Done building gene index ......
INFO @ Thu, 29 Jun 2023 02:55:13: Building TE index .......
INFO @ Thu, 29 Jun 2023 02:58:43: Done building TE index ......
Reading sample files
Hi,
Sorry for my slow response. It looks like it is reading in the file(s). Is it still stuck? If so, could you give me an excerpt of your BAM files (including the header), and I will see if it also hangs on our system.
Thanks, and my apologies.
Thank you for your cooperation. ill paste it here even if its binary: but also i can see a difference between my samples and the test samples shared on your lab web page. This is what I get with head -1 sample.bam
&ӚDc%03�d03��CҨH���S,7BEP��"]���E�tgA��g��=gTp�J��;��q��S
C�e:s��y�8I�<�t~r��hV��r;ɞYz2�T�,N�0�H�W�yxg�*��KDJ�q�tyi1Iz$�D��:I�$�I�/j)�F�>�R1��p��eL�(���q4:��65QB`�bh�Q\0�+�y�%h�i�Za!\LF
from one of your samples I get:
]*.�%nY��f������\@jZF�5U��Qt]RNZ�aH���%���ߒk�`��0�� ��6�g��o�n��^����,�V{�&L�7�
/8���_ǐ$���g0�!�Ťc�h��TJ�lK�&��n=�z�5$5��quض��{{�+�w[ְ=��s�>�JM�)'�V���j5������,q��0�]/�bW�X<�#ѝڢ�()
Thank you
Hi,
Thank you. Unfortunately, the binary portion is not that informative. You can get the text output by doing the following:
$ samtools view -h sample.bam | head -n 10000 > sample_excerpt.sam
Samtools can be installed here.
Thanks.
Thank you for the response, I attach the file right here. THank you sample_excerpt.zip
Hi,
There are a couple of things that I noticed:
1) Your chromosome names are not using the UCSC nomenclature (e.g. chr1
) that is typical for the mm10/mm39 genome builds. Instead, they are using NCBI Refseq nomenclature (e.g. NC_000067.7
)
2) Based on the chromosome nomenclature (e.g. NC_000067.7
), the genome build corresponds to the GRCm39 build. Thus, mm10 would not be the correct genome build for the gene or TE annotation.
Thus, there are two options: 1) Realign your samples to a genome FASTA sequence that uses either the UCSC (mm39), Ensembl (GRCm39) or GENCODE nomenclature (GRCm39). You should then use the gene and TE GTF corresponding to that genome build (we provide TE GTF for mm39, GRCm39 Ensembl and GRCm39 GENCODE) 2) You can use the TE GTF that has the GRCm39 NCBI chromosome names. You will also need to use a gene annotation that uses the NCBI chromosome names, which you can check by looking at the first column of the GTF file.
Please let me know if you have any questions.
Thanks.
Thank you sir, I was doubting this too. That's very helpful, this will solve the problem. Your help is very much appreciated
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Hello, I am trying to run the TEtranscripts on 3 treatment and 3 control files (each is approximately 4 to 6gb). The gtf files are indexed but the scripts take forever to run. How much memory is needed to run the program on mouse genome.
Thank you