riboviz / example-datasets

Example datasets to run with RiboViz
Apache License 2.0
2 stars 7 forks source link

Change data input file yeast_codon_pos_i200.RData to alternative file format #115

Open ewallace opened 2 years ago

ewallace commented 2 years ago

Update codon usage table to take flat-text yeast_codon_table.tsv file not the deprecated .Rdata format.

This is in response to pull request https://github.com/riboviz/riboviz/pull/287, that fixed issue https://github.com/riboviz/riboviz/issues/194

Will apply only to S. cerevisiae data because that's the only species which I know that we calculated a codon table for before.

Eldonoho99 commented 2 years ago

@ewallace Changed the yeast_codon_pos_i200.RData to the yeast_codon_table.tsv for the 3.1 yaml files for all Saccharomyces cerevisiae. Ran --validated_only which was fine but when I tried to run the updated yaml files with yeast_codon_table.tsv it failed. I believe this is because generate_stats_figs.R is hardcoded to accept an RData file. We discussed addressing this in the next hackathon yesterday.

The error was as follows:

summarise() ungrouping output (override with .groups argument) summarise() ungrouping output (override with .groups argument) Saving 7 x 7 in image Warning messages: 1: In write.table(metagene_start_stop_read_counts_data, file = tsv_file_path, : appending column names to file 2: In write.table(gene_poslen_counts_5start_df, file = tsv_file_path, : appending column names to file Saving 7 x 7 in image Warning message: In write.table(read_counts_by_length_data, file = tsv_file_path, : appending column names to file Warning message: In write.table(all_out, file = tsv_file_path, append = T, sep = "\t", : appending column names to file Parsed with column specification: cols( read_length = col_double(), asite_displacement = col_double() ) Note: Using an external vector in selections is ambiguous. ℹ Use all_of(feat_names) instead of feat_names to silence this message. ℹ See https://tidyselect.r-lib.org/reference/faq-external-vector.html. This message is displayed once per session. Warning messages: 1: In write.table(read_frame_per_orf_filtered_data, file = tsv_file_path, : appending column names to file 2: In write.table(gene_read_frames_data, file = tsv_file_path, append = T, : appending column names to file Saving 7 x 7 in image Warning message: In write.table(metagene_normalized_profile_start_stop_data, file = tsv_file_path, : appending column names to file Warning message: In write.table(tpms, file = tsv_file_path, append = T, sep = "\t", : appending column names to file Saving 7 x 7 in image geom_smooth() using formula 'y ~ x' Warning messages: 1: Removed 8629 rows containing non-finite values (stat_smooth). 2: Removed 8629 rows containing missing values (geom_point). 3: In write.table(features_plot_data, file = tsv_file_path, append = T, : appending column names to file Error in load(codon_positions_file) : bad restore file magic number (file may be corrupted) -- no data loaded Calls: CalculateCodonSpecificRibosomeDensity -> load In addition: Warning message: file ‘yeast_codon_table.tsv’ has magic number 'Gene ' Use of save versions prior to 2 is deprecated Execution halted`

acope3 commented 2 years ago

While I'm working on addressing this issue, I remembered that the Rdata files only include codons at positions 201 to the end of the transcript. Any genes <= 200 codons are excluded from the Rdata files. The code in generate_stats_figs.R that uses these files assumes this is the case for any provided Rdata file. We will need to allow for more flexibility in this code.

ewallace commented 2 years ago

Yes! Good reminder @acope3! That would explain the different results for the 2 files.

Yes we should be able to add a filter for starting position. I would like to do that but not sure when I'll be able to. Next hackathon?