Closed ivoflipse closed 10 years ago
One benefit of the latter is that in addition to walk/run subcategories, you could have day 0, after surgery, after treatment, 6 months check up etc. i.e. track a single subject's data long term. Useful for clinical cases (but yes, less useful for analysis of groups where the table option would be easier...)
So I've seem to have figured out a reasonable structure
Basically every level of nesting has a table that gives an overview of all the groups that are on that level of nesting and contains meta-data which can be used to search for items. The groups get an id, which basically auto-increments, to keep the namespace clean, so in order to find the item you're looking for, you look it up in the table and then get the group you need.
This boils down to a hierarchy like this:
To prevent duplication I've added the requirement to create a subject with a unique combination of first name, last name and birthday. All the others use the name supplied by the user as a unique identifier.
Now I just need to tie it into the models I had, so it no longer stores on disk, but instead creates this table. I'll also need some sort of database widget, because the user needs to be able to supply some subject info.
@lynforster The names were entered manually, so researchers are free to pick whatever they want.
I've also added a table, where you can store the id of a session and some label. That could be used to mark all sessions that are part of a study or label it with different protocol variables.
I've been adding a database tab to make it a bit more structured to create subjects, sessions and add measurements to them:
Its still very much a work in progress, but the process of making a subject and a session seems to work (apart from some rough edges). I've also just added a tree to browse to the folder that contains the measurements for a given session, the next step is to load these measurements and store them in PyTables. And then to extract the contacts and store those, but that's a problem for later.
One problem is that in the current state the rest of the application is broken, because of some changes I've made, so I really need to pull through and get this working.
And yet another step closer:
I've added some widgets which would be needed to store the measurements, such as the brand of the plate/model/frequency etc.
Storing the data in PyTables also seems to work:
While there are still some details to work out, its getting somewhere and shouldn't take too long to re-enable the processing widget (which currently no longer is able to find measurements). Some things I have to work on:
Also I should note that you can just paste in a file path of the folder that contains the measurements for a given session in the field on the right and it will automatically load the files in the folder (if any). Clicking the button next to it brings up a file dialog. The default location is stored in the config file, but I'll probably add a settings tab, so you can easily edit it using a GUI. I'll also figure out how to remember some of the choices made, probably by writing any changes to the config, such as picking a brand, back to the config file.
Getting one step closer:
I've managed to store the contact data and restoring is semi-working again, so making progress :+1:
While I'm still not 100% satisfied with the workflow, at least I can load results from PyTables now:
Well, at least on the processing tab that is. There are still some issues, mostly due to the order in which things are done. I tried decoupling a lot of things, but in the end that just results in a ton of additional bookkeeping, because functions don't know what measurement or session we're in.
Another issue is calculating the average, because this spans multiple measurements and the hierarchy as described above doesn't allow for an intuitive way to iterate over all contacts in a single session. Furthermore, my previous implementation just kept this structure in a dictionary, making it easy to look things up. Now, to extract a simple Contact object, I need to iterate over the rows in the contact table AND over all arrays of that contact, combine the two results for each contact and rebuild a Contact instance.
All doing this without messing without depending on the measurement the GUI thinks we're in, because selecting it through the same route would trigger UI events just to retrieve the data. Worse, to track contacts, I also need the measurement data, so unless I end up mixing different levels of abstraction or split things up, where it becomes hard to control the order in which things happen.
I think I'll refactor the code tomorrow, such that it will load all contacts it can find in the tables for the given session and if the currently selected measurement's contacts aren't present, track those. All in one spot, without breaking things up into 'neat', 'reusable' parts, until I can actually separate the concerns properly.
Given PyTables seems 100% integrated, I'm inclined to close this issue. I still need to get used to updating elements, whenever things change after I've initially stored them, but otherwise it seems to work alright.
After discussing the matter with Todd Pataky, I've decided to store all my data in PyTables, which
Furthermore, HDF5 is accessible from other programming languages, most notably Matlab.
Using HDF5 means I have to make a trade-off, either I have 'tables' with all subjects, sessions, measurements, like I did in iApp:
Or I go fully hierarchical, like so:![image](https://f.cloud.github.com/assets/485186/1033766/e17632fa-0f04-11e3-8984-a356d4a9e2dd.png)
The first option makes it really fast to calculate things over multiple subjects, because you have all information in one table. Whereas with the hierarchical option, you'd have to traverse the hierarchy to get to such information. On the other hand, the hierarchical approach is much more natural to how you think about the data conceptually, all these measurements belong to a subject and not to everybody.
Because of the choice for PyTables, I'm leaning towards the latter option. One issue though is that I need to pick a identifier for each subject, so you can easily access the data, like so:
root.subject_name.session_name.measurement_name
However, subject names are hardly unique, especially when using human names, so I'll probably have to use GUIDs or simply use some form of auto-incrementing name"subject" + subject_count
and maintain a lookup table to make it easier to find the subject I'm looking for.