kevinkovalchik / RawQuant

RawQuant is a Python package for extracting scan meta data and quantification values from Thermo .raw files.
MIT License
10 stars 2 forks source link

QC features #13

Open chrishuges opened 6 years ago

chrishuges commented 6 years ago

Discussion spot for features of the QC implementation of RawQuant.

Items

  1. Do we want to use real-time monitoring of a location?

    • The benefit of this is that all new files are automatically captured and processed.
    • The drawbacks of this are that if data file locations change, getting RQ to move with them (e.g. data getting dumped on a server for long term storage).
    • Also with data file types, if a PRM file gets acquired, will it just crash RQ?
    • I do like the idea of a 'manually-triggered' setup. Realistically, automatic processing isn't really necessary provided RQ provides fast enough processing. I think most users are going to sit down and look at the QC data, and just triggering at the start of that process is not a big deal.
  2. How to incorporate plots/reports?

    • I like Bokeh for this. It allows interactive plots that users can scan through or zoom with. Also a lot of flexibility in plot types.
  3. What types of information in the output?

    • Number of scans across all MS levels
    • topN, Hz, other scan rate metrics
    • Average peak widths
    • Average MS signal values across all scan levels
kevinkovalchik commented 6 years ago
  1. Something else to consider is users will be wanting to QC files differently depending on the nature of the experiment. So it would be advantageous to have multiple QC directories, and they probably won't want 5 instances of Python hanging out doing nothing most of the time. So I agree triggering the QC process is a better approach.

  2. Using Bokeh sounds like a good plan. I only have limited experience with it, but I can see there would be advantages compared to PDF files or forcing the user to use popup matplotlib windows.

  3. In addition to averages or medians, we might want to include some information on distribution, e.g. variance or percentiles. This isn't data I have looked at before, but I imagine it could be useful.

For separation data, in addition to peak width we can calculate a symmetry index, e.g. is the average peak fronting or tailing and by how much?

Other stuff: In known QC samples like BSA digest, we can monitor peak capacity and mass acurracy.