jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
58 stars 16 forks source link

Generate basic report with out meme comparison #54

Open kaillahs opened 3 months ago

kaillahs commented 3 months ago

Hi! I am using tf-modisco lite on data generated using chrombpnet. I generated the modisco.h5 file and want to now generate the report. Is it still possible to generate the report using the command modisco report -i modisco_results.h5 -o report/ -s report/ without the -m meme.txt input or has that option been omitted (I am getting an error message alerting me that the pipeline is missing the meme.txt input)? Thank you in advance for your help!

jmschrei commented 3 months ago

Yes, you shouldn't need the MEME file if you don't want to map your learned motifs to a database. Have you checked to make sure you are using the latest version? Can you post the entire command you used and the error message you got?

kaillahs commented 3 months ago

Hi Jacob - thank you for your quick reply. I actually moved all of my work to a different PC and the error doesn't seem to be occurring anymore! However, the motif.html that it generated is not able to load any of the cwm images...

Screenshot 2024-08-07 at 12 32 23 PM

kaillahs commented 3 months ago

The above report was for a specific region on using the chromBPNet tutorial model. I just tried using tfmodisco-lite on my own chrombpnet model (still using the tutorial bias model), but the report isn't showing any motifs this time. Any idea why this may be happening? I used the following commands both for profile and counts scores:

modisco motifs -i model_2.profile_scores.h5 -n 1000000 -o modisco_profile_results.h5 modisco report -i modisco_profile_results.h5 -o report/ -s report/

jmschrei commented 3 months ago

Are you running the report command in the same directory you're running the motifs command? What's in the report/ directory after running it?

kaillahs commented 3 months ago

Yes, I am running both commands in the same directory. The report folder contains an empty trimmed_logos folder as well as a motifs HTML file.

kaillahs commented 3 months ago

I generated another run with a different region and got 4 patterns, however, the same issue is occurring where the images are not fully showing up in the HTML despite being in the trimmed_logos folder.

kaillahs commented 3 months ago

Hi @jmschrei - I am checking in to see if you have any suggestions to resolve the above-mentioned issues... I've been running other regions through the pipeline, all of which are successfully creating pred_bw files that can be viewed through IGV, and a lot of them are not generating any motifs. Thank you in advance!

jmschrei commented 3 months ago

Hi @kaillahs. Unfortunately, issues like this are usually challenging to debug remotely. But what do you mean the report is generated for a specific region? Are you running tfmodisco on a single example? The report should be generated genome-wide. I do not believe that there are over 100 real seqlets in a single example.

kaillahs commented 3 months ago

@jmschrei - thank you for your reply. I've been using the chrombpnet contribs_bw pipeline which requires you to enter a specific region of the genome you want to run through the command. The region input is a 10-column BED file I created for a region of interest of approx. 3000 bp. I then used the output_prefix.profile_scores.h5 and the output_prefix.counts_scores.h5 for the -i argument of the modisco motifs command. You mentioned that the modisco command should be run on the whole genome... Does this mean that I need to alter the regions input for the contribs_bw command, or the -i input for the modisco command?

jmschrei commented 3 months ago

Sorry, but I'm actually unfamiliar with how the chrombpnet repo works. @austintwang do you happen to know?

panushri25 commented 3 months ago

@kaillahs 3000 bp is a very small region to get deepshaps and summarize seqlets using modisco. What are you trying to do?

Modisco is a summarization tool used to get a list of motifs found in all your peaks for example.

kaillahs commented 3 months ago

@panushri25 I am trying to get transcription factors across specific peak regions in the genome. These regions tend to be fairly small... Would it be best to just extend the region to around 300,000 bp? I've done this once and it seems to work just fine so I'm assuming the issues I was running into were simply size-related.

kaillahs commented 2 months ago

Hi! I am noticing that my match q-values are far lower for the counts score modisco reports than for the profile scores. Is there a reason for this? Also, should there be a difference in the quality of the modiso motif predictions depending on how far out I extend the regions? I seem to be getting around 8 predictions regardless of whether I extend my region by 360,000bp or 150,000bp, 4 of which are present in both for the count predictions and 1 of which is present in both for profile prediction.