Closed distilledchild closed 2 years ago
Yes, of course you can use HiSIF for any kinds of organism.
Please note the following two points for other organism:
All chromosomes need to be changed to numbers: 1, 2, 3, ...
If use non-human data, the bed files of enzyme digestion sites (like hindIII.Hg19.HiCPLD.bed for human data placed under the folder of resources) have to be prepared before HiSIF is used. HiC-Pro Digest Genome tool could make bed files from genome, please refer to: https://nservant.github.io/HiC-Pro/UTILS.html#digest-genome-py
Thank you.
@yufanzhouonline thank you for your answer! Just I am curious in which file should the chromosomes be changed into numbers: 1, 2, 3 ? I mean I am using the bam file which has 'chr1, chr2...' and it is aligned with them (e.g. chr1, chr2) in HiC pro already.
There are 2 input files, ref.fasta and cutting fragment file. Do you mean I have to change 'chr' into only numbers in the both files ?? That is because I ran runhisif.sh with config file, and after that I was NOT able to see any "chr" in .pairs, '.bwt2pairs.bam.pairs', and 'chrXX.tmp' files.
Even all chromosome tmp files are generated automatically, so it seems I don't need to make other chromosomes' tmp files. (FOR troubleshooting #5) I just found that my strain has 20 + X, Y chromosomes, but it has total 25 tmp files.(From chr1.tmp to chr25.tmp). Everything seems automated, right?
Also, I keep encountering the error, Error of “Segmentation fault” (troubleshooting #3) even though I am just running a chromosome 12. (I do have all other chromosome tmp files) All params are defaults from the setting but read length = 150, and reference genome is also chr12 only. Could you give me some suggestions please?
It looks like you are running HiSIF as "Quick Start" mentioned. runhisif.sh can only be used for human genome.
If you run other organism, you have to follow the instruction of "Customized Running".
There are three steps to run HiSIF:
Pre-Processing
Creating the chr-by-chr files
Running HiSIF
On the first step of preprocessing, transfer BAM/SAM files to 6-column text file by yourself:
chr1 pos1 strand1 chr2 pos2 strand2
Strand is 1 for positive strand and 0 for negative strand. Each chromosome need only the number and chrX is 23 and chrY is 24 for human. Similarly, 1, 2, 3, ... for other organism.
Please refer to the section of "Customized Running" on the links: https://github.com/yufanzhouonline/HiSIF
Thanks.
@yufanzhouonline Thank you for your guidance. I followed your advice and I found that it keeps making errors for “Segmentation fault” (# 3) and only make (SAMPLE)_t1_PerChr.txt even though I run only one chromosome (the shortest one, 12th). Even I tried 1/10 of chr12.tmp and still making the memory error, so I think there is something wrong because I tried to do that in 1T RAM.
Case1 working process for chr12:
HiSIF -g /user/bowtie2_ref -c /user/HiSIF_V1.00/resources/hic_pro_edited_for_hisif.bed -w 36 500 3000 -p 1 29 -t 1 -i 2 ./SHR Error: could not read the first line : Is a directory wc: /user/bowtie2_ref/.: Is a directory Error: could not read the first line : Is a directory wc: /user/bowtie2_ref/.: Is a directory (=:...........Start processing files...........:=) cuttingSiteTotal == 381448 <-----Parsed enzyme cutting site map-----> Segmentation fault (core dumped)
Case2 working process for chr12:
so, in both cases that have different process to create chr12.tmp, they have same errors.
I am thinking something wrong in CUTTING_FRAGMENTS file from digest-genome-py. The number of lines in enzyme cutting fragment file is 381448 in 12th chromosome.
First 10 lines of the file are here. chr12 31 36 HIC_chr12_1 0 + chr12 56 61 HIC_chr12_2 0 + chr12 78 83 HIC_chr12_3 0 + chr12 103 108 HIC_chr12_4 0 + chr12 225 230 HIC_chr12_5 0 + chr12 235 240 HIC_chr12_6 0 + chr12 300 305 HIC_chr12_7 0 + chr12 424 429 HIC_chr12_8 0 + chr12 449 454 HIC_chr12_9 0 + chr12 471 476 HIC_chr12_10 0 +
Do you have any suggestions for this situation?
Also, one more additional questions. If I use multiple enzymes, what fragment size would be good? ^GATC, ^ANTC, C^TNAG, T^TAA are the enzymes I am using.
If possible could I contact you? If possible, can I get an email please?
Thank you.
I think the reason why memory shortage issue on 1T RAM happens is too many enzyme fragments. Mine is 43344947 from 4 enzymes in 22 chromosomes. Also, 3353860 pairs in chr12.
If you run only one chromosome, please refer to #5 of "Troubleshooting" section of the link:
https://github.com/yufanzhouonline/HiSIF
Please contact me via email: zhouy4@uthscsa.edu if you have any further questions.
Thank you.
Hi, I am using rat data with 4 mixture of enzymes from HiC Pro (*.bwt2pairs.bam). (fragment file is also from HiC Pro) And I found that chromosome numbers are little bit different. Can I use this for non-human data? (A rat has 20 autosomes + X, Y)
Thank you.