upload data - Githubissues

martinjrobins commented 9 months ago

Data page will have 2 tabs:

[ ] Load Data
[ ] Stratification
[ ] Visualization

Load Data

Load data button (perhaps with table view of data if loaded)
Clicking load data button prompts for a file, then opens a stepper dialog for loading data. The stepper has the following steps
1. Map Headers
  1. maps headers in datafile with the standard set expected by the app
2. Map Dosing
  1. map dosing compartment from datafile to an amount variable in the model (*what if model is later changed!)
  2. set dosing units (if specified in the file needs to be already set)
3. Map Observations
  1. map observations in datafile to variable in the model (what if model is later changed!)
  2. set observation units (if specified in the file needs to be already set)

Stratification

define grouping of subjects
choose covariates that you wish to use for stratification
if categorical covariates are present allow forming groups based on these
- first priority Group or Cohort
- secondary covariates are Sex/Gender or Disease Status
if no categorical covariates present allow grouping based on dosing information
Groups defined here will be passed on to Trial Design and Simulations

Visualization

plot all observations
per plot allow stratifiation for all selected covariates. If IDs are defined, represent each ID as a spaghetti
maximum 2x2 plot. e.g. with different colours for each dose level and solid vs dashed lines for malt / femail

martinjrobins commented 8 months ago

example dataset: https://github.com/pkpdapp-team/pkpdapp-datafiles/blob/main/usecase2/PKPD_UseCase_Abx.csv

eatyourgreens commented 8 months ago

Another example dataset, with separate columns for amount and observation units: https://github.com/pkpdapp-team/pkpdapp-datafiles/blob/main/usecase_monolix/TE_Data.txt

martinjrobins commented 8 months ago

A few more points based on discussions in Basel on 16/02/24:

only need to upload a single dataset for now
user gets choice of action for uploading datasets with dosing information a. append new cohorts to existing list of cohorts b. replace current cohorts with those in datafile c. do not upload dosing information from datafile
new cohort functionality for trial design:
- trial design has multiple tabs for the different cohorts/groups
- need to run multiple simulations for the different cohorts
simulation tab:
- choose all / subset of cohorts to display
- mapped observation variable governs if observation is plotted or not in simulation plot

martinjrobins commented 8 months ago

Here are some example data files from Michael:

Data File_pkpd explorer_02.csv Data File_pkpd explorer_05.csv Data File_pkpd explorer_04.csv Data File_pkpd explorer_03.csv Data File_pkpd explorer_06.csv Data File_pkpd explorer_01.csv

Some additional info: A few general rules/ considerations we discussed before:

any non-numerical input (including empty cells) should be ignored with a warning (e.g., time variable includes non-numerical inputs which will be ignored). Any negative value for time should be ignored with a warning (time entries < 0 have been ignored). 2a. If no animal identifier (ID) is specified, all data are considered to originate from one animal and one dosing regimen 2b. (alternative) If an animal identifier (ID) is not specified, assign a new animal ID whenever a time entry is less than the previous entry Can we include this as a user selection in our stepper? e.g., Do you want to treat all data as a naive pool (coming from one individual)? Yes (solution 2a), No (solution 2b) Add wording for No selection: e.g., pkpd explorer will automatically identify animal IDs based on the time column (may introduce mistakes)
If animal IDs are specified, allow the user to sort animal ID into groups (unless group ID is specified)
If AMT (amount dosed) and ADM (administration site) are not specified in the data file, the user needs to specify this information in Trial Design for each group
Dosing events are any rows in which AMT and ADM contain a numerical value other than 0. For PK, any observations in those rows should be ignored with a warning (e.g., one or more observations coincide with a dosing event and will be ignored)
Multiple dosing events can be specified in different ways in the data file: a. ADDL (additional dose level) and II (inter-dose interval) specify the additional number of doses and the dosing interval b. Each dosing event is included as a separate row in the data file.
BLQ data, i.e., data below the limit of quantification may be indicated by a zero, BLQ, LLoQ or < in the observation column. They may also be identified if the corresponding entry in column cens (censored information) = 1 In the first instance, I suggest we ignore those (extension of rule 1, including a zero entry). In the long term, we might want to include this in the stepper: e.g., BLQ data have been identified in the dataset. How do you want to treat them? a. Ignore b. Set all BLQ data as LLoQ/2. User needs to provide LLoQ value and units. c. Set the first BLQ data (first occurance in time) for each animal and dosing occasion as LLoQ/2. User needs to provide LLoQ value and units.

martinjrobins commented 7 months ago

Some more data to try Data File_pkpd explorer_multipleYTYPE.csv. This is from simulated data:

Species: Monkey PK model: 1-compt PK model + bioavailability PD model: direct effects Emax

Parameters: default except, CL = 0.8 mL/h/kg F = 0.9 C50 = 1,000,000 pmol/L Emax = 5

C1 mapped to YTYPE 1 and E mapped to YTYPE 2.

martinjrobins commented 7 months ago

Some more todos on this from the meeting 12/4/24.

put ignore up the top (or bottom) of the select to make it more obvious
TODO: export data + protocols (not urgent)
put more infomation on data format required + some examples in "upload data" stepper tab
Errors when users upload a new dataset:
- Errors which are fixed should dissapear (e.g. after a user selects a time unit the "Please select a unit for all time values" should go away
- Some errors should be warnings / more info (e..g If ID is not detected then ID is auto-assigned. In this case a warning should appear saying that IDs were auto-created. Also if Admin ID / Amount not defined this should be a warning and info added that user can define the dosing protocol in the Trial Design after data upload is complete)
- You should be prevented from clicking "Next" until all errors are resolved
- error should be given if a dataset without headers is uploaded (ie. any file with only numerical entries on the first line)

eatyourgreens commented 6 months ago

https://github.com/pkpdapp-team/pkpdapp/pull/388 splits the CSV validation into errors (where columns are required in order to continue) and warnings (where columns are missing but can be inferred rom the data or added later.) It also revalidates the CSV when headers change, rather than validating once after the initial upload. That should stop you from proceeding until errors have been dealt with eg. setting a time unit when one is missing.

If the CSV has no header row, you'll get errors saying that Time and Observation columns are missing, but the only way to clear those would be to upload a new CSV with headers.

martinjrobins commented 6 months ago

looks excellent, thanks @eatyourgreens!

martinjrobins commented 6 months ago

snag list for data upload:

Tab	Problem description	Step	Item	Proposed Fix
Data	No warning for conc/ obs units	1	DV/ conc/ obs	Provide additional text "Concentration units have not been defined in the dataset and need to be defined manually"
Data	Unclear how ID/ groups are assigned	1	ID warning	Provide additional text "A new subject IDs is assigned according to time column, if time is provided in ascending order for each individual"
Data	Unable to select a dosing compartment w/o model selection	2	Dosing compartment	Please provide error message if no PK model is selected, e.g., "please select a PK model"
Data	Unable to select a variable w/o model selection	3	Map variable	Please provide error message if no PK model is selected, e.g., "please select a PK model"
Data	Stratification occurs after this information is first required (step 2)	4	Stratification	Would this be better placed as the 2nd step, as dosing compartment may differ for different groups/ cohorts.
Data	Unclear how to introduce a new group	4	Stratification	It works with hitting "enter" but is this how it's supposed to work? Not ideal for ipad, iphone. Suggest introducing a "confirm" button.
Data	For datasets w/o dosing information, the dosing information taken from Trial Design is visible in Data/Protocols but the IDs need reworking	Final	Datafile	tbd
Data	Subject ID is not provided in Data/ Observations	Final	Datafile	tbd
Data	Not possible to export datafile	Final	Datafile	Include Export Datafile functionality
General	App is quite a bit slower than before			tbd
Data	Replacing a datafile disables "next" button. If you load a datafile and select the time units and then drag and drop a new datafile, the data itself is replaced but you can no longer proceed to the next stel	1	replace datafile
Data	Name of datafile not visible	all steps		it may be useful for the user to be able to document/ display the name of the datafile they uploaded
Simulations	Automatic axis scaling works on the simulated data, which may truncate the observations	-		adjust axis limits on the observed and simulated data
Trial design	remove group does not work	-
Trial design	if an additional group is created (w/o data), the dosing protocol still appears in the datafile (dosing information)	-		Do we want to keep it this way? Con: when people export datafile for analysis in Monolix/etc. this information is redundant. Pro: It provides information of all scenarios investigated for reproduction in other software.
Simulations	By default only simulation 1 will be displayed when reloading the app, all other simulations are treated as temporary	-		Do we want to keep it this way?
Models	Once data is loaded, switching between models only works if the amount variables have the same name as the model to which the data were mapped. Works fine for all generic comp models, but does not work when switching to TMDD models or switching from a full TMDD to a QSS TMDD model due to differnt amount variable names.	-		If such a conflict arises, the user will need to remap the dosing compartment in Model and Data.
Data	Once a datafile is accepted, the only way to edit it is to upload and go through all the steps again.	Final		Can we add an "edit" button for users to change units or grouping etc. w/o having to redo all the steps?
Data	"Dimensionless" is not available for units selection	3		map an empty unit column entry to "dimensionless"
Data	descending order of YTYPEs	3	YTYPE order	change order of multiple YTYPES to ascending
Data		Final	Upload New Dataset	provide warning "this action will delete the current dataset"
Data	data loader does not recognize if two columns are mapped to "Observation"	1		if multiple columns are mapped to "Observation" treat them like different YTYPEs
Data		4	Secondary grouping	unclear what that does

eatyourgreens commented 6 months ago

I've pushed a few small fixes this morning:

fix a bug where Subject ID mysteriously disappears after uploading a new CSV (254a09842f614879a581778fb33c940fd983bf64.)
display 'dimensionless' as a unit option, when the unit symbol for a variable is an empty string (#410.)
ascending YTYPE order in the 'Map Variables' step (28398369b426fa948c96f8dde6256d29f575b28b.)
simulations should be displayed by default. You shouldn't need to select the checkboxes, top right of the page (1cf524da8b05a0f6d18420cefade3d7f8dc425bd.)
there's a Confim button now, to create new groups of subjects (e5b3e20a9d086d36f2182e4cbe26216a7276bf09.)
uploading data is disabled until a model has been selected, similar to Trial Design and Simulations (6f5a7089b37c5c82396c9153f983531008ab4f81.)

The fix for missing subject IDs might also have fixed the 'Remove Group' button. At least, I'm seeing the Data tab correctly refresh now, after removing a group of subjects.

eatyourgreens commented 6 months ago

I think the Python model only recognises one Observation column at the moment, but we could maybe merge multiple observation columns into a single Observation column, with an Observation ID column to group observations by type.

In that case, I don’t think the CSV could have an Observation ID column. I think there are two mutually exclusive cases. We currently only support the first:

single Observation column plus an Observation ID column to track YTYPE for each observation.
multiple Observation columns. YTYPE would be assigned automatically for each column, then merge the columns into a single observation column before uploading.

eatyourgreens commented 6 months ago

Exporting a CSV from the Data tab is close to being done. I might be able to finish that tomorrow.

martinjrobins commented 6 months ago

Some more snags from the Roche team which I'll copy here. I've also included the "Fixed" column to indicate what I think has already been fixed (as far as I can see but you might want to check @eatyourgreens ):

Tab	Problem description	Step	Item	Proposed Fix	Fixed	Comment
Data	No warning for conc/ obs units	1	DV/ conc/ obs	Provide additional text "Concentration units have not been defined in the dataset and need to be defined manually"	Yes
Data	Unclear how ID/ groups are assigned	1	ID warning	Provide additional text "A new subject IDs is assigned according to time column, if time is provided in ascending order for each individual"	Yes
Data	Unable to select a dosing compartment w/o model selection	2	Dosing compartment	Please provide error message if no PK model is selected, e.g., "please select a PK model"	Yes	data upload disabled until pk model is selected
Data	Unable to select a variable w/o model selection	3	Map variable	Please provide error message if no PK model is selected, e.g., "please select a PK model"	Yes
Data	Stratification occurs after this information is first required (step 2)	4	Stratification	Would this be better placed as the 2nd step, as dosing compartment may differ for different groups/ cohorts.	Yes
Data	Unclear how to introduce a new group	4	Stratification	It works with hitting "enter" but is this how it's supposed to work? Not ideal for ipad, iphone. Suggest introducing a "confirm" button.	Yes
Data	For datasets w/o dosing information, the dosing information taken from Trial Design is visible in Data/Protocols but the IDs need reworking	Final	Datafile	tbd		what do you mean the id's need reworking? The id given in the Data/Protocols is the id of the particular protocol, the group # given in the title is associated with the "Group" column in the main dataset
Data	Subject ID is not provided in Data/ Observations	Final	Datafile	tbd	Yes
Data	Not possible to export datafile	Final	Datafile	Include Export Datafile functionality
General	App is quite a bit slower than before			tbd
Data	Replacing a datafile disables "next" button. If you load a datafile and select the time units and then drag and drop a new datafile, the data itself is replaced but you can no longer proceed to the next stel	1	replace datafile		Yes
Data	Name of datafile not visible	all steps		it may be useful for the user to be able to document/ display the name of the datafile they uploaded
Simulations	Automatic axis scaling works on the simulated data, which may truncate the observations	-		adjust axis limits on the observed and simulated data	Yes
Trial design	remove group does not work	-			Yes
Trial design	if an additional group is created (w/o data), the dosing protocol still appears in the datafile (dosing information)	-		Do we want to keep it this way? Con: when people export datafile for analysis in Monolix/etc. this information is redundant. Pro: It provides information of all scenarios investigated for reproduction in other software.
Simulations	By default only simulation 1 will be displayed when reloading the app, all other simulations are treated as temporary	-		Do we want to keep it this way?	Yes	all simulations displayed on reload
Models	Once data is loaded, switching between models only works if the amount variables have the same name as the model to which the data were mapped. Works fine for all generic comp models, but does not work when switching to TMDD models or switching from a full TMDD to a QSS TMDD model due to differnt amount variable names.	-		If such a conflict arises, the user will need to remap the dosing compartment in Model and Data.
Data	Once a datafile is accepted, the only way to edit it is to upload and go through all the steps again.	Final		Can we add an "edit" button for users to change units or grouping etc. w/o having to redo all the steps?		I think the export datafile will solve this, user can then export, edit manually, then reupload
Data	"Dimensionless" is not available for units selection	3		map an empty unit column entry to "dimensionless"	Yes
Data	descending order of YTYPEs	3	YTYPE order	change order of multiple YTYPES to ascending	Yes
Data		Final	Upload New Dataset	provide warning "this action will delete the current dataset"	Yes
Data	data loader does not recognize if two columns are mapped to "Observation"	1		if multiple columns are mapped to "Observation" treat them like different YTYPEs		This fix is incompatible with the Observation ID column, only works if this column is not provided. Can display error if multiple observation columns are provided and an observation id column?
Data		4	Secondary grouping	unclear what that does		Is secondary grouping still useful to have (i.e. sort into groups based on 2 catagorical covariates)? Can either delete this, or provide a text description of what it does?
Data	After upload of a datafile and setting the time units, if the user changes a column header, they cannot proceed to the next step and the time units error warning appears but time units can not be set. Reselection of the time unit activates the next button.	1		similar to line 12
Data	Error message for dosing units states "amount units can be set in Trial Design", however, amount units are set on this page	2	Dosing units	remove this error message
Data	next and back buttons should stay in one location to allow quick movement through the stepper	1-4
Data			Upload New Dataset	do we need this, why not start this page with the interface the user sees after pressing "Upload New Dataset"
Data		2	Automated mapping	could we automatically map dosing variables if "Route" information is provided? IV map to A1 variable (e.g., A1, A1_f, A1_t), SC or PO map to Aa
Data	unclear how to select another cat covariate as primary grouping, app always seems to just back to group	4	Primary grouping
Data	how are the ID assigned for the dosing protocol?
Data	Infusion time = 0 in datasheet			It is likely that we will get data with infusion time = 0 (as Monolix and Nonmem accept those); for the pkpd explorer I suggest that if infusion time = 0, we autoatically set this to a very short time interval e.g., 30 seconds.
Data	Mutiple dosing with individual dosing events given as separate lines (see right) not recognised			App identifies that multiple doses are administered but does not match the time appropriately. Units of dosing are also incorrectly displayed in mg/kg in Trial Design.

pkpdapp-team / pkpdapp

upload data #347

Load Data

Stratification

Visualization