nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

error while running HiCPro2FitHiC.py #389

Closed priyatamapandey closed 3 years ago

priyatamapandey commented 3 years ago

Hi, I have used singularity container for running hicpro and it generated all the files, Further I want to use FitHiC. To do that I am using HiCPro2FitHiC.py but it ended up with the error. Although it generated 3 files but I think those are not the complete files. I tried using those files in FitHiC and it gave me error again.

The command I used is

singularity shell --bind /project/roselai_228  /project/wiemels_260/priya/myTools/hicpro_3.0.0_ubuntu.img 
/HiC-Pro-devel_py3/bin/utils/hicpro2fithic.py -i matrix/SRR1030745/raw/1000000/SRR1030745_1000000.matrix -b matrix/SRR1030745/raw/1000000/SRR1030745_1000000_abs.bed -s matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix.biases -o output

And the error I got is below

Screen Shot 2020-12-11 at 6 29 45 PM

I am guessing that may be there is issue in the output results generated from Hicpro. FYI, the command I used for HicPro is

singularity shell --bind /project/roselai_228 /project/wiemels_260/priya/myTools/hicpro_3.0.0_ubuntu.img Singularity> HiC-Pro -i /project/roselai_228/priyatap/HiC_work/fastq/SRR1030745 -o /project/roselai_228/priyatap/HiC_work/output -c /project/roselai_228/priyatap/HiC_work/annotationFiles/config-hicpro.txt

Please help to figure out this issue, Thank you, Priya

nservant commented 3 years ago

Hi Would you mind sharing with me the input files please ? So that I can try to reproduce the bug ? Thanks

priyatamapandey commented 3 years ago

Hi, Thank you for your reply. I also asked from the FitHiC group and they found that my _abs.bed file has one less row than the biases file which seems to cause the error.

Attached please find the input files. troubleshooting.zip

Thank you so much for help, Priya

priyatamapandey commented 3 years ago

Hi, Have you able to reproduce the error? Any idea, what causing error?

Thank you, Priya

priyatamapandey commented 3 years ago

Hi,

In the meanwhile I tried to run older version 2.11.4 from https://zerkalo.curie.fr/partage/HiC-Pro/singularity_images/hicpro_latest_ubuntu.img I have received an error. I used the same command as earlier. Here is the error

Screen Shot 2020-12-16 at 3 35 00 PM

Please help me, how to resolve this issue.

Thank you,

nservant commented 3 years ago

Hi, I figured out what's the issue, but I do not understand how it is possible ... In your matrix, you have 3114 bins, but in the bias files, you have 3115 values. Could you please check which iced version your are using ? and update it to the latter version if necessary ? Thanks

priyatamapandey commented 3 years ago

Hi,

Thank you for checking it. It used 0.5.6 version of iced. I am pasting log file for ice

`cat ice_1000000.log ice --results_filename hic_results/matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --max_iter 100 --eps 0.1 --remove-all-zeros-loci --output-bias 1 --verbose 1 hic_results/matrix//SRR1030745/raw/1000000/SRR1030745_1000000.matrix /usr/local/anaconda/lib/python3.7/site-packages/iced/normalization/_ca_utils.py:9: UserWarning: The API of this module is likely to change. Use only for testing purposes "The API of this module is likely to change. " Using iced version 0.5.6 Loading files... Normalizing... Filter 264 out of 3115 bins ... Matrix is triangular superior Writing results...

`

I want to tell you that I tried the older version of HiC pro singularity container 2.11.4 . And it worked and further I used the utility code hicpro2fithic.py and that also worked. Although, now I am getting some error in fithic. Do you think the newer version may be have some bug. I am pasting the same ice log file which generated which generated from 2.11.4 version. In this case iced version is 0.4.2.

`

cat ice_1000000.log ice --results_filename hic_results/matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --max_iter 100 --eps 0.1 --remove-all-zeros-loci --output-bias 1 --verbose 1 hic_results/matrix//SRR1030745/raw/1000000/SRR1030745_1000000.matrix Using iced version 0.4.2 Loading files... Normalizing... Filter 263 out of 3114 bins ... Matrix is triangular superior ICE at iteration 1 156.18873409511477 ICE at iteration 2 41.63980704529987 ICE at iteration 3 11.578281107643033 ICE at iteration 4 3.2393413338099606 ICE at iteration 5 0.911371468541986 ICE at iteration 6 0.2572246911103202 break at iteration 7 Writing results..

` Thank you, Priya

nservant commented 3 years ago

Thank you Priya. That's good to know. I'll check with the iced developer.

For fitHiC, please use the fitHiC google group to find help

nservant commented 3 years ago

Hi Priya One short question. Did you compare the two bias files that you have (with iced 0.5.6 and iced 0.4.2) ? Are they similar ? I would expect that the output of iced 0.5.6 has an extra line ? hopefully the first one ? Thank you best

priyatamapandey commented 3 years ago

Hi, that is right. Here is the first 10 line of that file.

iced 0.4.2 3114 lines

[priyatap@discovery1 1000000]$ head -n 10 SRR1030745_1000000_iced.matrix.biases 1.259179284464612414e-01 4.220329226397311895e-01 4.778662319336556830e-01 4.565832414325673438e-01 7.838514289932088097e-01 9.350828020805725949e-01 7.184684395304455906e-01 7.690394973140728396e-01 9.585290308679312865e-01 6.438992048300308246e-01

iced 0.5.6 3115 lines

[priyatap@discovery2 1000000]$ head -n 10 SRR1030745_1000000_iced.matrix.biases nan 1.259179284464615189e-01 4.220329226397306899e-01 4.778662319336555719e-01 4.565832414325678434e-01 7.838514289932090318e-01 9.350828020805728169e-01 7.184684395304433702e-01 7.690394973140743939e-01 9.585290308679310645e-01

Is that the sole cause of error?

Thanks, Priya

nservant commented 3 years ago

Thank you Priya I asked @NelleV to look at it ! best

priyatamapandey commented 3 years ago

Hi, I have few paired end files and one single end fastq file. How should I proceed with the single end fastq file. I keep each sample in different subfolder under the main input file folders.

Thank you, Priya

nservant commented 3 years ago

Hi Priya HiC-Pro cannot handle single-end data. And actually, this is the first time I'm seeing single-end data for Hi-C ... Sorry for that Best

priyatamapandey commented 3 years ago

Hi, I downloaded this SRA file from the below link. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1267209

If you check the description in the link it has only one file of paired file (may be just forwards reads file)

Description - CTCF_NP_R1.fastq.gz

Is there any way I can run this file using His-Pro?

Thank you, Priya

nservant commented 3 years ago

Hi Priya This is not a Hi-C dataset, I think this is a CTCF ChIP-seq dataset ;) Best

priyatamapandey commented 3 years ago

Detail description confused me. Thanks for noticing.

Priya

priyatamapandey commented 3 years ago

Hi, I want to do SNP specific analysis. I want to run your sample data first but unable to find out the input example file mgp.v2.snps.annot.reformat.vcf. Can you please locate me the exact link?

Thank you, Priya

nservant commented 3 years ago

Hi Priya, Are you working on Mouse ? what are the parental strains ? Best

priyatamapandey commented 3 years ago

Hi, I am working on human brain cells. I want to limit my analysis for few functional snips. Si that I can get more interaction around an anchor/snp in a chromosome. Is there any parameter or any way to limit interactions chromosome wise instead of gemone wide. May be that way I can generate more interaction in region of interest. Thank you, Priya

nservant commented 3 years ago

Hi Priya, If you only want to look at a few snps, I think you do it by hand (with a custom script). The allele specfic mode of HiC-pro is mainly useful if you have a list of all phased snps genome-wide, and you want to distinguish the interactions between parental chromosomals. Best

priyatamapandey commented 3 years ago

Hi, Thank you for reply. I am exploring the options more in HicPro, so that I can utilize your tool. I want to generate result similar to the below screen shot, that is mostly focused around a genomic position. In the image, my region of interest is first 2 column window which interacting with moving window of 40KB.

Screen Shot 2021-01-11 at 6 08 00 PM

I found the capture target bed file option can be given as a input in His-Pro. What this file is basically ? Can I give a window of 40KB or something similar?

[CAPTURE_TARGET] | BED file of target regions to focus on (mainly used for capture Hi-C data)

Thanks for bearing with me, Priya

priyatamapandey commented 3 years ago

Hi, After running hicpro2fithic.py and found that the output file named fragmentMappability.gz is not exactly in the fithic input required format.

Here is the screenshot.

Screen Shot 2021-01-17 at 12 49 52 AM

This file is fragment file for the fithic and the description for this file is I am pasting below

The -f argument is used to pass in a full path to what we deem a 'fragments file,' Each line will have 5 entries. The second and fifth fields can be any integer as they are not needed in most cases. The first field is the chromosome name or number, the third field is the coordinate of the midpoint of the fragment on that chromosome, the fourth field is the total number of observed mid-range reads (contact counts) that involve the specified fragment. The fields can be separated by space or tab. All possible fragments need to be listed in this file.

where as it should look like below.

Screen Shot 2021-01-17 at 12 54 46 AM

I can see my files showing fragment entires in col 2 and col 3 both. Please suggest me why I am seeing this kind of output.

Thanks, Priya

nservant commented 3 years ago

Hi Pryia, The hicpro2fithic.py was actually writting by the FitHiC team :) so I'm not really an expert. Would you mind asking the question on the FitHiC google group ? Thanks

priyatamapandey commented 3 years ago

Hi Nicolas, I am using target capture to see interaction around a region. I did not find any example for the target capture bed file so I am playing by changing the length of the region that means changing start and end position from few base pairs to 40KB in my capture target bed file. My goal is to get the interaction of the target region with 1Mb of the either side around that region.

So my question is, my capture target file is changing only, I have to rerun HiC-pro all the time from the beginning or I can start from build_contact_maps step option? I would appreciate if you also suggest capture target file bin size too.

Thank you, Priya