pkpdapp-team / pkpdapp

A web application for modeling the distribution and effects of drugs.
BSD 3-Clause "New" or "Revised" License
10 stars 3 forks source link

upload data #347

Closed martinjrobins closed 2 months ago

martinjrobins commented 9 months ago

Data page will have 2 tabs:

Load Data

Stratification

Visualization

martinjrobins commented 8 months ago

example dataset: https://github.com/pkpdapp-team/pkpdapp-datafiles/blob/main/usecase2/PKPD_UseCase_Abx.csv

eatyourgreens commented 8 months ago

Another example dataset, with separate columns for amount and observation units: https://github.com/pkpdapp-team/pkpdapp-datafiles/blob/main/usecase_monolix/TE_Data.txt

martinjrobins commented 8 months ago

A few more points based on discussions in Basel on 16/02/24:

martinjrobins commented 8 months ago

Here are some example data files from Michael:

Data File_pkpd explorer_02.csv Data File_pkpd explorer_05.csv Data File_pkpd explorer_04.csv Data File_pkpd explorer_03.csv Data File_pkpd explorer_06.csv Data File_pkpd explorer_01.csv

Some additional info: A few general rules/ considerations we discussed before:

  1. any non-numerical input (including empty cells) should be ignored with a warning (e.g., time variable includes non-numerical inputs which will be ignored). Any negative value for time should be ignored with a warning (time entries < 0 have been ignored). 2a. If no animal identifier (ID) is specified, all data are considered to originate from one animal and one dosing regimen 2b. (alternative) If an animal identifier (ID) is not specified, assign a new animal ID whenever a time entry is less than the previous entry Can we include this as a user selection in our stepper? e.g., Do you want to treat all data as a naive pool (coming from one individual)? Yes (solution 2a), No (solution 2b) Add wording for No selection: e.g., pkpd explorer will automatically identify animal IDs based on the time column (may introduce mistakes)
  2. If animal IDs are specified, allow the user to sort animal ID into groups (unless group ID is specified)
  3. If AMT (amount dosed) and ADM (administration site) are not specified in the data file, the user needs to specify this information in Trial Design for each group
  4. Dosing events are any rows in which AMT and ADM contain a numerical value other than 0. For PK, any observations in those rows should be ignored with a warning (e.g., one or more observations coincide with a dosing event and will be ignored)
  5. Multiple dosing events can be specified in different ways in the data file: a. ADDL (additional dose level) and II (inter-dose interval) specify the additional number of doses and the dosing interval b. Each dosing event is included as a separate row in the data file.
  6. BLQ data, i.e., data below the limit of quantification may be indicated by a zero, BLQ, LLoQ or < in the observation column. They may also be identified if the corresponding entry in column cens (censored information) = 1 In the first instance, I suggest we ignore those (extension of rule 1, including a zero entry). In the long term, we might want to include this in the stepper: e.g., BLQ data have been identified in the dataset. How do you want to treat them? a. Ignore b. Set all BLQ data as LLoQ/2. User needs to provide LLoQ value and units. c. Set the first BLQ data (first occurance in time) for each animal and dosing occasion as LLoQ/2. User needs to provide LLoQ value and units.
martinjrobins commented 7 months ago

Some more data to try Data File_pkpd explorer_multipleYTYPE.csv. This is from simulated data:

Species: Monkey PK model: 1-compt PK model + bioavailability PD model: direct effects Emax

Parameters: default except, CL = 0.8 mL/h/kg F = 0.9 C50 = 1,000,000 pmol/L Emax = 5

C1 mapped to YTYPE 1 and E mapped to YTYPE 2.

martinjrobins commented 7 months ago

Some more todos on this from the meeting 12/4/24.

eatyourgreens commented 6 months ago

https://github.com/pkpdapp-team/pkpdapp/pull/388 splits the CSV validation into errors (where columns are required in order to continue) and warnings (where columns are missing but can be inferred rom the data or added later.) It also revalidates the CSV when headers change, rather than validating once after the initial upload. That should stop you from proceeding until errors have been dealt with eg. setting a time unit when one is missing.

If the CSV has no header row, you'll get errors saying that Time and Observation columns are missing, but the only way to clear those would be to upload a new CSV with headers.

martinjrobins commented 6 months ago

looks excellent, thanks @eatyourgreens!

martinjrobins commented 6 months ago

snag list for data upload:

Tab Problem description Step Item Proposed Fix
Data No warning for conc/ obs units 1 DV/ conc/ obs Provide additional text "Concentration units have not been defined in the dataset and need to be defined manually"
Data Unclear how ID/ groups are assigned 1 ID warning Provide additional text "A new subject IDs is assigned according to time column, if time is provided in ascending order for each individual"
Data Unable to select a dosing compartment w/o model selection 2 Dosing compartment Please provide error message if no PK model is selected, e.g., "please select a PK model"
Data Unable to select a variable w/o model selection 3 Map variable Please provide error message if no PK model is selected, e.g., "please select a PK model"
Data Stratification occurs after this information is first required (step 2) 4 Stratification Would this be better placed as the 2nd step, as dosing compartment may differ for different groups/ cohorts.
Data Unclear how to introduce a new group 4 Stratification It works with hitting "enter" but is this how it's supposed to work? Not ideal for ipad, iphone. Suggest introducing a "confirm" button.
Data For datasets w/o dosing information, the dosing information taken from Trial Design is visible in Data/Protocols but the IDs need reworking Final Datafile tbd
Data Subject ID is not provided in Data/ Observations Final Datafile tbd
Data Not possible to export datafile Final Datafile Include Export Datafile functionality
General App is quite a bit slower than before     tbd
Data Replacing a datafile disables "next" button. If you load a datafile and select the time units and then drag and drop a new datafile, the data itself is replaced but you can no longer proceed to the next stel 1 replace datafile  
Data Name of datafile not visible all steps   it may be useful for the user to be able to document/ display the name of the datafile they uploaded
Simulations Automatic axis scaling works on the simulated data, which may truncate the observations -   adjust axis limits on the observed and simulated data
Trial design remove group does not work -    
Trial design if an additional group is created (w/o data), the dosing protocol still appears in the datafile (dosing information) -   Do we want to keep it this way? Con: when people export datafile for analysis in Monolix/etc. this information is redundant. Pro: It provides information of all scenarios investigated for reproduction in other software.
Simulations By default only simulation 1 will be displayed when reloading the app, all other simulations are treated as temporary -   Do we want to keep it this way?
Models Once data is loaded, switching between models only works if the amount variables have the same name as the model to which the data were mapped. Works fine for all generic comp models, but does not work when switching to TMDD models or switching from a full TMDD to a QSS TMDD model due to differnt amount variable names. -   If such a conflict arises, the user will need to remap the dosing compartment in Model and Data.
Data Once a datafile is accepted, the only way to edit it is to upload and go through all the steps again. Final   Can we add an "edit" button for users to change units or grouping etc. w/o having to redo all the steps?
Data "Dimensionless" is not available for units selection 3   map an empty unit column entry to "dimensionless"
Data descending order of YTYPEs 3 YTYPE order change order of multiple YTYPES to ascending
Data   Final Upload New Dataset provide warning "this action will delete the current dataset"
Data data loader does not recognize if two columns are mapped to "Observation" 1   if multiple columns are mapped to "Observation" treat them like different YTYPEs
Data   4 Secondary grouping unclear what that does
eatyourgreens commented 6 months ago

I've pushed a few small fixes this morning:

The fix for missing subject IDs might also have fixed the 'Remove Group' button. At least, I'm seeing the Data tab correctly refresh now, after removing a group of subjects.

eatyourgreens commented 6 months ago

I think the Python model only recognises one Observation column at the moment, but we could maybe merge multiple observation columns into a single Observation column, with an Observation ID column to group observations by type.

In that case, I don’t think the CSV could have an Observation ID column. I think there are two mutually exclusive cases. We currently only support the first:

eatyourgreens commented 6 months ago

Exporting a CSV from the Data tab is close to being done. I might be able to finish that tomorrow.

martinjrobins commented 6 months ago

Some more snags from the Roche team which I'll copy here. I've also included the "Fixed" column to indicate what I think has already been fixed (as far as I can see but you might want to check @eatyourgreens ):

Tab Problem description Step Item Proposed Fix Fixed Check Comment
Data No warning for conc/ obs units 1 DV/ conc/ obs Provide additional text "Concentration units have not been defined in the dataset and need to be defined manually" Yes    
Data Unclear how ID/ groups are assigned 1 ID warning Provide additional text "A new subject IDs is assigned according to time column, if time is provided in ascending order for each individual" Yes    
Data Unable to select a dosing compartment w/o model selection 2 Dosing compartment Please provide error message if no PK model is selected, e.g., "please select a PK model" Yes   data upload disabled until pk model is selected
Data Unable to select a variable w/o model selection 3 Map variable Please provide error message if no PK model is selected, e.g., "please select a PK model" Yes    
Data Stratification occurs after this information is first required (step 2) 4 Stratification Would this be better placed as the 2nd step, as dosing compartment may differ for different groups/ cohorts. Yes    
Data Unclear how to introduce a new group 4 Stratification It works with hitting "enter" but is this how it's supposed to work? Not ideal for ipad, iphone. Suggest introducing a "confirm" button. Yes    
Data For datasets w/o dosing information, the dosing information taken from Trial Design is visible in Data/Protocols but the IDs need reworking Final Datafile tbd     what do you mean the id's need reworking? The id given in the Data/Protocols is the id of the particular protocol, the group # given in the title is associated with the "Group" column in the main dataset
Data Subject ID is not provided in Data/ Observations Final Datafile tbd Yes    
Data Not possible to export datafile Final Datafile Include Export Datafile functionality      
General App is quite a bit slower than before     tbd      
Data Replacing a datafile disables "next" button. If you load a datafile and select the time units and then drag and drop a new datafile, the data itself is replaced but you can no longer proceed to the next stel 1 replace datafile   Yes    
Data Name of datafile not visible all steps   it may be useful for the user to be able to document/ display the name of the datafile they uploaded      
Simulations Automatic axis scaling works on the simulated data, which may truncate the observations -   adjust axis limits on the observed and simulated data Yes    
Trial design remove group does not work -     Yes    
Trial design if an additional group is created (w/o data), the dosing protocol still appears in the datafile (dosing information) -   Do we want to keep it this way? Con: when people export datafile for analysis in Monolix/etc. this information is redundant. Pro: It provides information of all scenarios investigated for reproduction in other software.      
Simulations By default only simulation 1 will be displayed when reloading the app, all other simulations are treated as temporary -   Do we want to keep it this way? Yes   all simulations displayed on reload
Models Once data is loaded, switching between models only works if the amount variables have the same name as the model to which the data were mapped. Works fine for all generic comp models, but does not work when switching to TMDD models or switching from a full TMDD to a QSS TMDD model due to differnt amount variable names. -   If such a conflict arises, the user will need to remap the dosing compartment in Model and Data.      
Data Once a datafile is accepted, the only way to edit it is to upload and go through all the steps again. Final   Can we add an "edit" button for users to change units or grouping etc. w/o having to redo all the steps?     I think the export datafile will solve this, user can then export, edit manually, then reupload
Data "Dimensionless" is not available for units selection 3   map an empty unit column entry to "dimensionless" Yes    
Data descending order of YTYPEs 3 YTYPE order change order of multiple YTYPES to ascending Yes    
Data   Final Upload New Dataset provide warning "this action will delete the current dataset" Yes    
Data data loader does not recognize if two columns are mapped to "Observation" 1   if multiple columns are mapped to "Observation" treat them like different YTYPEs     This fix is incompatible with the Observation ID column, only works if this column is not provided. Can display error if multiple observation columns are provided and an observation id column?
Data   4 Secondary grouping unclear what that does     Is secondary grouping still useful to have (i.e. sort into groups based on 2 catagorical covariates)? Can either delete this, or provide a text description of what it does?
Data After upload of a datafile and setting the time units, if the user changes a column header, they cannot proceed to the next step and the time units error warning appears but time units can not be set. Reselection of the time unit activates the next button. 1   similar to line 12      
Data Error message for dosing units states "amount units can be set in Trial Design", however, amount units are set on this page 2 Dosing units remove this error message      
Data next and back buttons should stay in one location to allow quick movement through the stepper 1-4          
Data     Upload New Dataset do we need this, why not start this page with the interface the user sees after pressing "Upload New Dataset"      
Data   2 Automated mapping could we automatically map dosing variables if "Route" information is provided? IV map to A1 variable (e.g., A1, A1_f, A1_t), SC or PO map to Aa      
Data unclear how to select another cat covariate as primary grouping, app always seems to just back to group 4 Primary grouping        
Data how are the ID assigned for the dosing protocol?            
Data Infusion time = 0 in datasheet     It is likely that we will get data with infusion time = 0 (as Monolix and Nonmem accept those); for the pkpd explorer I suggest that if infusion time = 0, we autoatically set this to a very short time interval e.g., 30 seconds.      
Data Mutiple dosing with individual dosing events given as separate lines (see right) not recognised     App identifies that multiple doses are administered but does not match the time appropriately. Units of dosing are also incorrectly displayed in mg/kg in Trial Design.