chromBPNet training questions

kaillahs commented 4 months ago

Hi,

I am integrating ChromBPNet analysis to refine snATAC-seq data (10x genomics) derived from skeletal muscle lysate. I am running into the following questions when training chromeBPNet:

- Is there a minimum number of cells or read depth necessary for training chromBPNet? Our total number of cells pre-QC filtering is 6305 and 4705 post-QC filtering.

- Would you recommend training chromeBPNet on only our cell population of interest, or all nuclei derived from our tissue lysate? My QC filtered dataset has 3580 cells of my population of interest. I’m inclined to train with a subset of my population of interest only, but am unsure whether my sample is sufficiently large for model training and subsequent analysis.

- How should I handle peak calling files? I have treated (n = 2 bio replicates) and control (n = 1 bio replicates) available. Each of these files has a peak.bed files associated with it. According to Anushri in Issue #117 on github, I should not be using the peak.bed files generated by 10x. Am I correct in understanding that the recommendation is to take the merged.bam file created in the previous step to peak call manually using MACS2? In doing this, I would have given the following command:

!macs2 callpeak -t data/downloads/merged.bam -f BAMPE -n "MACS2Peaks" -g "mm" -p 0.01 --shift -75 --extsize 150 --nomodel -B --SPMR --keep-dup all --call-summits --outdir data/downloads/MACS2PeakCallingPE

as recommended by the ENCODE pipeline with the exception of changing -f input to "BAMPE" instead of "BAM" as we are working with paired-end data. However, Anshul advised against this change in issue #176. Does this mean that I should keep the -f command as "BAM" even though I am working with paired-end data?

- Multiple folds: I would like to confirm my understanding of the usage of multiple folds. As I understand it I should be creating multiple folds (is there a recommended number?) in the splits folder each of which contains a different combination of training and validation chromosomes. I would then train a bias model and chrombpnet model for each fold separately. Later on, when using the tools, I would have to average out the bigwig or h5 files before inputting them into a given tool. Please let me know if this sounds right.

Thank you!

kaillahs commented 4 months ago

** The above issue was accidentally closed - help would still be appreciated with the above questions.

Hi,

I have run into some additional questions regarding the training of my bias model since I last posted this question. I am concerned about the results of the training reports, specifically the training validity loss curves. Below I have described 3 models that I have trained so far as well as the concerns I have about them.

Model 1: This model is training on the total scATAC-seq data set stemming from muscle lysate. I found that the total loss for both the training and the validity loss curves was generally very high (between 400 and 500) and that the separation between the two curves is also concerningly large (60). This makes me think that the model may not be generalizing the training data very well and that the training set size may need to be increased. However, the current (and suggested) training validation split has 3 testing chromosomes, 2 validation chromosomes, and 16 training chromosomes, making it hard to increase the training set.

Model 2: I have tried subsetting the original ATAC-seq data to a specific cell type of interest (while keeping the original split), and this seems to decrease the total loss for the training and validity loss curves (between 230 and 265) but maintains the gap at around 28.

Model 3: Lastly, I decreased model 2's training set to ensure that the model is not learning specific patterns (5 testing, 2 validation, 14 training), but this did not change the curves and worsened other metrics in the QC report.

It is my understanding that the definition of a "good" training validity loss curve varies depending on the project. My concern stems from the fact that the example QC report provided by chromBPNet features a graph with a training loss between 155 and 161 and a gap of only 4.

Though all other metrics adhere to the cut-offs, I am wondering if this may have to do with the fact that my pearsonr value for nonpeaks tends to veer towards the lower end (0.13, 0.29, and 0.00 respectively). If so, how would you suggest fixing this issue?

I have attached the QC reports below for your convenience.

Model_1_QC.pdf Model_2_QC.pdf Model_3_QC.pdf

Thank you!

akundaje commented 4 months ago

Model 1 looks just fine. Note the y-axis of the loss plots. They are not from 0 to X but are between the ranges of the losses for training and validation. Can't really compare the loss values between runs directly. And the separation between training and validation loss is not unexpected. It does not indicate overfitting since the loss curves are both decreasing across epochs and early stopping is implemented to avoid overfitting i.e. training loss decreasing, validation loss increasing. The learned features look just fine. I would go ahead with the model.

-A

On Wed, Jul 31, 2024 at 11:16 AM kaillahs @.***> wrote:

Reopened #200 https://github.com/kundajelab/chrombpnet/issues/200.

— Reply to this email directly, view it on GitHub https://github.com/kundajelab/chrombpnet/issues/200#event-13714088807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDWEPZ5ULVKH6KW4PH3ULZPESW3AVCNFSM6AAAAABKV2VDWWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTG4YTIMBYHA4DANY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kaillahs commented 4 months ago

Anshul - thank you for this guidance! I went ahead and tried training chromBPNet using biasModel_2. During the training, I received the following error message and am unsure how to resolve it. This is not an error I've run into either when following the tutorial or when training the bias models. Thank you in advance.

Generating 'profile' shap scores
Done 0 examples of 29992
Done 100 examples of 29992
Done 200 examples of 29992
Done 300 examples of 29992
Done 400 examples of 29992
Done 500 examples of 29992
Done 600 examples of 29992
Done 700 examples of 29992
Done 800 examples of 29992
Done 900 examples of 29992
Done 1000 examples of 29992
Done 1100 examples of 29992
Done 1200 examples of 29992
Done 1300 examples of 29992
Done 1400 examples of 29992
Done 1500 examples of 29992
Done 1600 examples of 29992
Done 1700 examples of 29992
Done 1800 examples of 29992
Done 1900 examples of 29992
Done 2000 examples of 29992
Done 2100 examples of 29992
Done 2200 examples of 29992
Done 2300 examples of 29992
Done 2400 examples of 29992
Done 2500 examples of 29992
Done 2600 examples of 29992
Done 2700 examples of 29992
Done 2800 examples of 29992
Done 2900 examples of 29992
Done 3000 examples of 29992
Done 3100 examples of 29992
Done 3200 examples of 29992
Done 3300 examples of 29992
Done 3400 examples of 29992
Done 3500 examples of 29992
Done 3600 examples of 29992
Done 3700 examples of 29992
Done 3800 examples of 29992
Done 3900 examples of 29992
Done 4000 examples of 29992
Done 4100 examples of 29992
Done 4200 examples of 29992
Done 4300 examples of 29992
Done 4400 examples of 29992
Done 4500 examples of 29992
Done 4600 examples of 29992
Done 4700 examples of 29992
Done 4800 examples of 29992
Done 4900 examples of 29992
Done 5000 examples of 29992
Done 5100 examples of 29992
Done 5200 examples of 29992
Done 5300 examples of 29992
Done 5400 examples of 29992
Done 5500 examples of 29992
Done 5600 examples of 29992
Done 5700 examples of 29992
Done 5800 examples of 29992
Done 5900 examples of 29992
Done 6000 examples of 29992
Done 6100 examples of 29992
Done 6200 examples of 29992
Done 6300 examples of 29992
Done 6400 examples of 29992
Done 6500 examples of 29992
Done 6600 examples of 29992
Done 6700 examples of 29992
Done 6800 examples of 29992
Done 6900 examples of 29992
Done 7000 examples of 29992
Done 7100 examples of 29992
Done 7200 examples of 29992
Done 7300 examples of 29992
Done 7400 examples of 29992
Traceback (most recent call last):
  File "/home/colin/miniconda3/envs/chrombpnet/bin/chrombpnet", line 8, in <module>
    sys.exit(main())
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 23, in main
    pipelines.chrombpnet_train_pipeline(args)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 136, in chrombpnet_train_pipeline
    interpret.main(args_copy)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/interpret.py", line 132, in main
    interpret(model, seqs, args.output_prefix, args.profile_or_counts)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/interpret.py", line 89, in interpret
    profile_shap_scores = profile_model_profile_explainer.shap_values(
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/shap/explainers/deep/deep_tf.py", line 284, in shap_values
    bg_data = self.data([X[l][j] for l in range(len(X))])
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/shap_utils.py", line 58, in shuffle_several_times
    return [np.array([dinuc_shuffle(s[0]) for i in range(numshuffles)])]
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/shap_utils.py", line 58, in <listcomp>
    return [np.array([dinuc_shuffle(s[0]) for i in range(numshuffles)])]
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/deeplift/dinuc_shuffle.py", line 107, in dinuc_shuffle
    counters[t] += 1
TypeError: 'range_iterator' object does not support item assignment

real    259m40.030s
user    177m27.716s
sys     12m57.174s

panushri25 commented 3 months ago

Hey @kaillahs You havent seen this error when running the tutorial and the htmls were generated correctly?

kaillahs commented 3 months ago

Hi Anusri - thank you for your reply! I did not see this error when running the tutorial. All files were generated, and the overall report looked good. overall_report (10).pdf

panushri25 commented 3 months ago

Can you share the command used to run this ?

kaillahs commented 3 months ago

I am in the process of training two chromBPNet models - one on a full scATAC-seq data set and one trained on a specific population of interest. The above error from Friday came up when training the model on the population of interest using this command:

time chrombpnet pipeline -ibam data/downloads/merged.bam -d "ATAC" -g data/downloads/mm10.fa -c data/downloads/mm10.chrom.sizes -p data/downloads/peaks_no_blacklist.bed -n data/output_negatives.bed -fl data/splits/fold_0.json -b bias_model_1/models/bias.h5 -o chrombpnet_model_1/

I also just encountered the following error a couple of minutes ago while trying to train the model on the full ATACseq data set using the bias mode 1 as Anshul suggested.

(chrombpnet) colin@Reynolds:~$ cd  /mnt/c/Users/Colin/Documents/Programming/chrombpnet_ks24
(chrombpnet) colin@Reynolds:/mnt/c/Users/Colin/Documents/Programming/chrombpnet_ks24$ time chrombpnet pipeline -ibam data/downloads/merged.bam -d "ATAC" -g data/downloads/mm10.fa -c data/downloads/mm10.chrom.sizes -p data/peaks_no_blacklist.bed -n data/output_negatives.bed -fl data/splits/fold_0.json -b bias_model_2/models/bias.h5 -o chrombpnet_model_2/
Estimating enzyme shift in input file
/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py:108: UserWarning: !!! WARNING: Input reads contain chromosomes not in the reference genome fasta provided. Please ensure you are using the correct reference genome. If you are confident you are using the correct reference genome, you can safely ignore this message.
  warnings.warn(colored(msg, 'red'))
Current estimated shift: +0/+0
Making BedGraph
sort: cannot create temporary file in '/tmp': Read-only file system
Traceback (most recent call last):
  File "/home/colin/miniconda3/envs/chrombpnet/bin/chrombpnet", line 8, in <module>
    sys.exit(main())
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 23, in main
    pipelines.chrombpnet_train_pipeline(args)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 21, in chrombpnet_train_pipeline
    reads_to_bigwig.main(args)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/reads_to_bigwig.py", line 89, in main
    generate_bigwig(args.input_bam_file,
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/reads_to_bigwig.py", line 46, in generate_bigwig
    auto_shift_detect.stream_filtered_tagaligns(p1, genome_fasta_file, p2)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 56, in stream_filtered_tagaligns
    out_stream.stdin.write(line)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <function _TemporaryFileCloser.__del__ at 0x7f018cbccb80>
Traceback (most recent call last):
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/tempfile.py", line 440, in __del__
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/tempfile.py", line 436, in close
OSError: [Errno 30] Read-only file system: '/tmp/tmp_at8nu_v'
Bus error

real    77m20.952s
user    57m10.849s
sys     10m44.816s
(chrombpnet) colin@Reynolds:/mnt/c/Users/Colin/Documents/Programming/chrombpnet_ks24$``

panushri25 commented 3 months ago

You are receiving the latest error because you do not have write permissions on your /tmp directory which is being used for sorting your bam. You can provide an alternate dir for your temp files for this using the argument --tmpdir

Are you using a different cluster / machine / environment to do your tutorial runs versus these runs ? Is the setup exactly the same?

kaillahs commented 3 months ago

I am just realizing that I am running low on disk space - I'll try the --tmpdir command after I free up some space to ensure that that's not hindering anything.

I ran the tutorial and trained all bias models in Jupyter Notebooks using shell commands, but decided to start training models directly in the terminal as the notebooks started crashing and giving errors with regard to volume and speed of the output.

panushri25 commented 3 months ago

Its unlikely you will not see these errors in tutorials but see only in your data - unless your setup has changed or the data does not follow the same formatting. But I think the issue is the former, I would try and make sure the setup is similar for your jupyter environment versus your command line. Also run them in docker.

kaillahs commented 3 months ago

Ok, thank you. I am running into some unrelated software issues, but once that is figured out, I will try to train the tutorial model in the terminal to confirm that the setup works. I'll let you know whether or not it ends up working.

kaillahs commented 3 months ago

I haven't set up Docker but will continue to work on that. I just tried running the tutorial in the terminal and ran into a series of warnings followed by the same 'range iterator' error message. I am confused as to why running the code in jupyter notebook using shell commands works but running it directly in the shell doesn't...

Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  AddV2 used in model but handling of op is not specified by shap; will use original  gradients
Warning:  StopGradient used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  SpaceToBatchND used in model but handling of op is not specified by shap; will use original  gradients
Warning:  BatchToSpaceND used in model but handling of op is not specified by shap; will use original  gradients
Generating 'profile' shap scores
Done 0 examples of 30000
Done 100 examples of 30000
Done 200 examples of 30000
Done 300 examples of 30000
Done 400 examples of 30000
Done 500 examples of 30000
Done 600 examples of 30000
Done 700 examples of 30000
Done 800 examples of 30000
Done 900 examples of 30000
Done 1000 examples of 30000
Done 1100 examples of 30000
Done 1200 examples of 30000
Done 1300 examples of 30000
Done 1400 examples of 30000
Done 1500 examples of 30000
Done 1600 examples of 30000
Done 1700 examples of 30000
Done 1800 examples of 30000
Done 1900 examples of 30000
Done 2000 examples of 30000
Done 2100 examples of 30000
Done 2200 examples of 30000
Done 2300 examples of 30000
Done 2400 examples of 30000
Done 2500 examples of 30000
Done 2600 examples of 30000
Done 2700 examples of 30000
Done 2800 examples of 30000
Done 2900 examples of 30000
Done 3000 examples of 30000
Done 3100 examples of 30000
Done 3200 examples of 30000
Done 3300 examples of 30000
Done 3400 examples of 30000
Traceback (most recent call last):
  File "/home/colin/miniconda3/envs/chrombpnet/bin/chrombpnet", line 8, in <module>
    sys.exit(main())
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 23, in main
    pipelines.chrombpnet_train_pipeline(args)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 136, in chrombpnet_train_pipeline
    interpret.main(args_copy)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/interpret.py", line 132, in main
    interpret(model, seqs, args.output_prefix, args.profile_or_counts)
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/interpret.py", line 89, in interpret
    profile_shap_scores = profile_model_profile_explainer.shap_values(
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/shap/explainers/deep/deep_tf.py", line 284, in shap_values
    bg_data = self.data([X[l][j] for l in range(len(X))])
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/shap_utils.py", line 58, in shuffle_several_times
    return [np.array([dinuc_shuffle(s[0]) for i in range(numshuffles)])]
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/evaluation/interpret/shap_utils.py", line 58, in <listcomp>
    return [np.array([dinuc_shuffle(s[0]) for i in range(numshuffles)])]
  File "/home/colin/miniconda3/envs/chrombpnet/lib/python3.8/site-packages/deeplift/dinuc_shuffle.py", line 107, in dinuc_shuffle
    counters[t] += 1
TypeError: 'range_iterator' object does not support item assignment

real    272m5.729s
user    181m37.686s

kaillahs commented 3 months ago

I am running the tutorial in Docker and will let you know if the same error occurs.

kaillahs commented 3 months ago

I just finished running the tutorial in docker and ran into the same error as the one mentioned in issue #201:

12405/12405 [==============================] - ETA: 0s
12405/12405 [==============================] - 131s 10ms/step
2024-08-07 13:57:59 Traceback (most recent call last):
2024-08-07 13:57:59   File "/opt/conda/bin/chrombpnet", line 33, in <module>
2024-08-07 13:57:59     sys.exit(load_entry_point('chrombpnet', 'console_scripts', 'chrombpnet')())
2024-08-07 13:57:59   File "/scratch/chrombpnet/chrombpnet/CHROMBPNET.py", line 23, in main
2024-08-07 13:57:59     pipelines.chrombpnet_train_pipeline(args)
2024-08-07 13:57:59   File "/scratch/chrombpnet/chrombpnet/pipelines.py", line 40, in chrombpnet_train_pipeline
2024-08-07 13:57:59     import chrombpnet.training.predict as predict
2024-08-07 13:57:59   File "/scratch/chrombpnet/chrombpnet/training/predict.py", line 14, in <module>
2024-08-07 13:57:59     from scipy import nanmean, nanstd
2024-08-07 13:57:59 ImportError: cannot import name 'nanmean' from 'scipy' (/opt/conda/lib/python3.9/site-packages/scipy/__init__.py)
2024-08-07 14:03:37 (base) root@bc40611dc150:/dockerdir#

It looks like the package on docker installed scipy version 1.13.1 and numpy version 1.23.4.

Thank you in advance for your help!

panushri25 commented 3 months ago

what installation approach did you use?

kaillahs commented 3 months ago

Installed docker, pulled the docker image, and then ran the container.

(base) baxter@Kirk:~$ docker pull kundajelab/chrombpnet:latest latest: Pulling from kundajelab/chrombpnet d5fd17ec1767: Pull complete 50b37fabf1b7: Pull complete 269c6117408b: Pull complete 5ec4361a6d52: Pull complete a490261931bd: Pull complete 844fbc8fd6b0: Pull complete 4fabf614524b: Pull complete 740ccd578ab1: Pull complete 0109560da7c9: Pull complete 0a4aed53d5e8: Pull complete 398903116ae5: Pull complete 30c6be94d3fd: Pull complete d618b4133209: Pull complete bb2b5c16eb9e: Pull complete 0a231ee37f46: Pull complete 97920c16ff9f: Pull complete 07adec79b31d: Pull complete 0e3fbcba844b: Pull complete 96e947ed8966: Pull complete d0872177f37d: Pull complete 1035dbed29f1: Pull complete 012d9699a5c2: Pull complete d8ed61b5f53e: Pull complete 2c16d2e1d590: Pull complete 2c8a6ef8e2a5: Pull complete 4f4fb700ef54: Pull complete 5a5dc9fe34fd: Pull complete Digest: sha256:7c4530ec548302897576a49fe563b4b373548695aee205e45f534ac25eef98eb Status: Downloaded newer image for kundajelab/chrombpnet:latest docker.io/kundajelab/chrombpnet:latest

(base) baxter@Kirk:~/Documents/KaillahS/chrombpnet_tutorial$ docker run -it --rm --memory=100g --gpus device=0 -v /home/baxter/Documents/KaillahS/chrombpnet_tutorial:/dockerdir kundajelab/chrombpnet:latest

________                               _______________
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/

WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

panushri25 commented 3 months ago

I meant how did you install the chrombpnet repo?

Can you try this open /scratch/chrombpnet/chrombpnet/training/predict.py comment out the line from scipy import nanmean, nanstd and report back on how the run goes?

kaillahs commented 3 months ago

The code is still running, but making that change resolved the nanmean error. It is almost done generating the profile shap scores so I hope it doesn't encounter an error in that step as it did two runs ago.

kaillahs commented 3 months ago

The tutorial pipeline ran successfully and the report matched the tutorial report posted on GitHub. I will make sure to go in and edit the predict.py file before running the pipeline on my own data. The process did take 19 hours to run which is longer than it has previously taken (approx. 4-5hrs), so I'm going to spend some time making sure my GPU access is set up properly.

Now that I have ensured that my environment is set up correctly, I wanted to refer back to my original post to get your guidance on the following questions:

- Would you recommend training chromeBPNet on only our cell population of interest, or all nuclei derived from our tissue lysate? My QC filtered dataset has 3580 cells of my population of interest. I’m inclined to train with a subset of my population of interest only, but am unsure whether my sample is sufficiently large for model training and subsequent analysis. In an above comment, Anshul had approved my bias_model_1 which was trained on the whole population. Should I move forward with this model when training the chrombpnet model?

- How should I handle peak calling files? I was unsure how to handle peak-calling for the bias pipeline and have been training various version of bias models accordingly. I have treated (n = 2 bio replicates) and control (n = 1 bio replicates) available. Each of these files has a peak.bed files associated with it. According to Anushri inhttps://github.com/kundajelab/chrombpnet/issues/117 on github, I should not be using the peak.bed files generated by 10x. Am I correct in understanding that the recommendation is to take the merged.bam file created in the previous step to peak call manually using MACS2? In doing this, I would have given the following command:

!macs2 callpeak -t data/downloads/merged.bam -f BAMPE -n "MACS2Peaks" -g "mm" -p 0.01 --shift -75 --extsize 150 --nomodel -B --SPMR --keep-dup all --call-summits --outdir data/downloads/MACS2PeakCallingPE

as recommended by the ENCODE pipeline with the exception of changing -f input to "BAMPE" instead of "BAM" as we are working with paired-end data. However, Anshul advised against this change in https://github.com/kundajelab/chrombpnet/issues/176. Does this mean that I should keep the -f command as "BAM" even though I am working with paired-end data?

- Multiple folds: I would like to confirm my understanding of the usage of multiple folds. As I understand it I should be creating multiple folds (is there a recommended number?) in the splits folder each of which contains a different combination of training and validation chromosomes. I would then train a bias model and chrombpnet model for each fold separately. Later on, when using the tools, I would have to average out the bigwig or h5 files before inputting them into a given tool. Please let me know if this sounds right.

Thank you for your guidance!

kaillahs commented 3 months ago

I just checked my GPUs with the nvidia-smi command and it looks like everything is installed correctly and compatible (the docker container and my system are both using CUDA 12.4). Is there a reason that this run is taking so much longer or is this to be expected?

akundaje commented 3 months ago

This is something you need to figure out based on your hardware. It should not take 19 hours. It is likely it's not using the GPU. You said you previously got it to run in 4-5 hours. What did you change?

On Thu, Aug 8, 2024, 12:02 PM kaillahs @.***> wrote:

I just checked my GPUs with the nvidia-smi command and it looks like everything is installed correctly and compatible (the docker container and my system are both using CUDA 12.4). Is there a reason that this run is taking so much longer or is this to be expected?

— Reply to this email directly, view it on GitHub https://github.com/kundajelab/chrombpnet/issues/200#issuecomment-2276473167, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDWENJRFR52FCFA3I2Q63ZQO6FVAVCNFSM6AAAAABKV2VDWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGQ3TGMJWG4 . You are receiving this because you commented.Message ID: @.***>

kaillahs commented 3 months ago

I had to change PCs as the old one is having disk space issues and crashing. The 4-5 hour runs were on the old PC via jupyter notebook or the terminal whereas I am now using a docker container on the new PC. Based on issue #95 I think it may be because my PC is using python 3.12.1 which is keeping me from installing tensorflow-gpu version 2.8

(base) baxter@Kirk:~$ python -V
Python 3.12.1
(base) baxter@Kirk:~$ pip install tensorflow-gpu==2.8
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.8 (from versions: 2.12.0)
ERROR: No matching distribution found for tensorflow-gpu==2.8

kaillahs commented 3 months ago

I just tried creating a new contained and running the pipeline after ensuring that, within the container, I have cuda 11.2, cudnn 8.1, TensorRT 7.2.2, and tensorflow-gpu 2.8, but it looks like it's still only using CPUs.

kaillahs commented 3 months ago

I've successfully trained one chrombpnet model and figured out how to use the various tools. It is my understanding that the outputs are going to be more accurate when using multiple models each trained on different folds and then averaging out their outputs. Should each model be using the same bias model or do I also need to train several separate ones?

kaillahs commented 3 months ago

Hi - I've trained 3 models across different folds for the full data set as well as the population of interest. I am hesitant to average out the results because I am unsure of how to check the model's performance before moving forward with its outputs.

As of now, I plan on choosing the model with the best performance. However, looking at the individual models, I am unsure of which one to move forward with as I am concerned about the validation loss functions only decreasing by around 3%.

Subset_0: 3.8% Subset_1: 2.4% Subset_2: 2%

Total_0: 2.6% Total_1: 2.9% Total_2: 3.4%

I wanted to check in and see if this standard is for this pipeline or if there is something I should be adjusting on my end since all other metrics meet the given thresholds.

Thank you!

akundaje commented 3 months ago

What are the other performance metrics for the 3 folds (correlation of observed and predicted log counts across test set peaks) as output by the code. Please dont use the loss to evaluate models. The loss values are not necessarily comparable across models or folds and they are not particularly calibrated against an interpretable upper/lower bound.

On Tue, Aug 27, 2024 at 10:47 AM kaillahs @.***> wrote:

Hi - I've trained 3 models across different folds for the full data set as well as the population of interest. I am hesitant to average out the results because I am unsure of how to check the model's performance before moving forward with its outputs.

As of now, I plan on choosing the model with the best performance. However, looking at the individual models, I am unsure of which one to move forward with as I am concerned about the validation loss functions only decreasing by around 3%.

Subset_0: 3.8% Subset_1: 2.4% Subset_2: 2%

Total_0: 2.6% Total_1: 2.9% Total_2: 3.4%

I wanted to check in and see if this standard is for this pipeline or if there is something I should be adjusting on my end since all other metrics meet the given thresholds.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/kundajelab/chrombpnet/issues/200#issuecomment-2313171705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDWEMIXR4G64ERGQSFOF3ZTS3RJAVCNFSM6AAAAABKV2VDWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGE3TCNZQGU . You are receiving this because you commented.Message ID: @.***>

kaillahs commented 3 months ago

@akundaje - thank you for the prompt response! The other metrics are fairly constant for all 6 models:

peaks.pearsonr ~ 0.75 peaks.mse ~ 0.25 peaks.median_jsd ~ 0.37 for total, 0.45 for subset peaks.median_norm_jsd ~ 0.37 for total, 0.32 for subset average of max profiles ~ 0.002

akundaje commented 3 months ago

Looks good. You should average predictions and contribution scores across folds rather than using the "best fold"

On Tue, Aug 27, 2024, 1:05 PM kaillahs @.***> wrote:

@akundaje https://github.com/akundaje - thank you for the prompt response! The other metrics are fairly constant for all 6 models:

peaks.pearsonr ~ 0.75 peaks.mse ~ 0.25 peaks.median_jsd ~ 0.37 for total, 0.45 for subset peaks.median_norm_jsd ~ 0.37 for total, 0.32 for subset average of max profiles ~ 0.002

— Reply to this email directly, view it on GitHub https://github.com/kundajelab/chrombpnet/issues/200#issuecomment-2313408244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDWEJCJEH5KIXSZHQPLLDZTTLXZAVCNFSM6AAAAABKV2VDWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGQYDQMRUGQ . You are receiving this because you were mentioned.Message ID: @.***>

kaillahs commented 3 months ago

Ok, thank you! I will average out the results and see what I get.

kundajelab / chrombpnet

chromBPNet training questions #200