Closed richelbilderbeek closed 4 years ago
Reinstalled netmhc2pan, then try:
p230198@peregrine:bbbq_1_fast sbatch run_r_script.sh create_counts_per_proteome.R human h14
Submitted batch job 13422720
Great, the error message has already improved, from
p230198@peregrine:bbbq_1_fast cat $(ls | egrep 720)
Rscript create_counts_per_proteome.R human h14
haplotype: HLA-DRB1*0101
mhc_class: 2
peptide_length: 13
ic50_prediction_tool: netmhc2pan
Error in netmhc2pan::run_netmhc2pan(fasta_filename = temp_fasta_filename, :
ERROR: Wrong input format.
Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> <Anonymous> -> <Anonymous>
Execution halted
to
p230198@peregrine:bbbq_1_fast cat $(ls | egrep 791)
Rscript create_counts_per_proteome.R human h14
haplotype: HLA-DRB1*0101
mhc_class: 2
peptide_length: 13
ic50_prediction_tool: netmhc2pan
Error in netmhc2pan::run_netmhc2pan(fasta_filename = temp_fasta_filename, :
NetMHCIIpan output file no created. alleles_as_word: 'DRB1_0101,'. peptide_length: '13'. temp_xls_filename: '/home/p230198/.cache/temp_7b6579c3a6b9.xls'. temp_xls_filename: '/home/p230198/.cache/temp_7b6579c3a6b9.xls'. fasta_filename: '/home/p230198/.cache/temp_7b652a5c1b31.fasta'. utils::head(readLines(fasta_filename)): '>seq1
GTGG'. NetMHCII error output: 'ERROR: Wrong input format.'
Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> <Anonymous> -> <Anonymous>
Execution halted
The FASTA that fails:
p230198@peregrine:bbbq_1_fast cat /home/p230198/.cache/temp_7b652a5c1b31.fasta
>seq1
GTGG
it is remarkably short :confused:
Added more descriptive errors in netmhc2pan
. Also: bbbq checks for short sequence lengths.
Run again, after installing bbbq
:
p230198@peregrine:bbbq_1_fast q
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
13427123 regular install_ p230198 R 0:46 1 pg-node210
p230198@peregrine:bbbq_1_fast sbatch --dependency=afterok:13427123 run_r_script.sh create_counts_per_proteome.R human h14
Submitted batch job 13427149
In #56 I found out that there is a sequence with a 'U' amino acids that caused the problem.
Hmmm, NetMHC2pan does handle U
:
test_that("NetMHC2pan does handle U", {
if (!pureseqtmr::is_pureseqtm_installed()) return()
if (!netmhc2pan::is_netmhc2pan_installed()) return()
protein_sequences <- c(
"FANTASTICALLY",
"FANTASTICALLU"
)
t <- predict_counts_per_proteome(
protein_sequences = protein_sequences,
haplotype = get_mhc2_haplotypes()[1],
peptide_length = 13,
percentile = 0.05,
ic50_prediction_tool = "netmhc2pan"
)
t
expect_true(tibble::is_tibble(t))
expect_equal(nrow(t), length(protein_sequences))
expect_true("n_binders" %in% names(t))
expect_true("n_binders_tmh" %in% names(t))
expect_true("n_spots" %in% names(t))
expect_true("n_spots_tmh" %in% names(t))
})
This is great: it now fails fast:
p230198@peregrine:bbbq_1_fast cat $(ls | egrep 13427149)
Rscript create_counts_per_proteome.R human h14
haplotype: HLA-DRB1*0101
mhc_class: 2
peptide_length: 13
ic50_prediction_tool: netmhc2pan
Error in bbbq::check_protein_sequences_length(protein_sequences = protein_sequences, :
Protein sequence shorter than epitope peptide length .
'peptide_length': 13
sequence index: 1217
sequence length: 4
sequence: GTGG
Calls: <Anonymous> -> <Anonymous>
Execution halted
###############################################################################
Peregrine Cluster
Job 13427149 for user 'p230198'
Finished at: Thu Aug 27 09:23:55 CEST 2020
Job details:
============
Name : run_r_script
User : p230198
Partition : regular
Nodes : pg-node193
Cores : 1
State : FAILED
Submit : 2020-08-27T09:13:45
Start : 2020-08-27T09:23:47
End : 2020-08-27T09:23:55
Reserved walltime : 10-00:00:00
Used walltime : 00:00:08
Used CPU time : 00:00:02 (efficiency: 37.19%)
% User (Computation): 69.21%
% System (I/O) : 30.76%
Mem reserved : 10G/node
Max Mem used : 0.00 (pg-node193)
Max Disk Write : 0.00 (pg-node193)
Max Disk Read : 0.00 (pg-node193)
Acknowledgements:
=================
Please see this page for information about acknowledging Peregrine in your publications:
https://wiki.hpc.rug.nl/peregrine/additional_information/scientific_output
################################################################################
Aha, there is a very small peptide in the human proteome:
p230198@peregrine:bbbq_1_fast cat human_proteins_lut.csv | egrep "GTGG$"
p1217,sp|P0DPI4|TDB01_HUMAN,GTGG
Checking in the original file:
p230198@peregrine:bbbq_1_fast cat human.fasta | egrep "^GTGG$" -A 1 -B 1
>sp|P0DPI4|TDB01_HUMAN T cell receptor beta diversity 1 OS=Homo sapiens OX=9606 GN=TRBD1 PE=4 SV=1
GTGG
>sp|Q6N021|TET2_HUMAN Methylcytosine dioxygenase TET2 OS=Homo sapiens OX=9606 GN=TET2 PE=1 SV=3
Indeed, the human reference proteome contains a 4-mer!
Created #58 to handle this Issue, now knowing the cause.
Needed for #54.
From log file: run_r_script_13381509.log