Closed andrewelamb closed 4 years ago
@andrewelamb , this could certainly happen in principle -- datasets and publications can be associated with multiple centers/grants and hence consortia. The first one is certainly correct -- it is associated with two different grants from Columbia, one of which is in CSBC and the other in PSON.
OK, no problem, my code wasn't handling this correctly, and I wanted to make sure this was correct before making a fix.
OK I lied, there is a problem! Datasets aren't associated directly with a consortium, but through a grant. The 5 I listed above are only associated with one grant each(as are all datasets I believe). In other words the way the database is set up datasets have a many-to-one relationship with grants. To do so we would need to add a table called dataset_grant to capture the new many-to-many relationship. Then instead of adding another consortium to a dataset you would need to add another grant.
@andrewelamb the grantName and grantId columns are inconsistent. e.g., if you search for the first dataset SELECT * FROM syn21897968 where "datasetName" LIKE '%An automatec microwell%' you'll see that it has two Columbia grants under grantName, but only one under grantId. This is because grantId is an Entity. When I tried to add multiple synapse Ids in grantId, it complained that it was not an entity. So, I tried to edit the schema (using the web UI) to make this field a string. But it won't allow that complaining
Can not perform schema change on _LIST type columns for Table Entities
Sorry -- I forgot this came up in my late night hacking. I did something expedient and inconsistent. Yet another consequence of working late at night at deadlines ...
Upshot: I suspect the grantName columns are correct and grantId is not -- containing only one of the grants in grantName.
@bswhite Yeah the grantId column is just entity, not a list. @jaeddy can you have entity list as a type? If not we may need to change this to stringList (or even string).
I ran into the same error as you Brian, James mentioned I needed to be in alpha mode to remove a column, maybe it's the same for changing the column type.
@andrewelamb what is alpha mode and how do I get there? James mentioned this once, but I didn't follow up. We are having trouble editing tables (e.g., the datasets table) using the web UI -- I believe because of multi-value types/annotations. It would be fantastic if alpha mode was a way around this. Otherwise, I'm having to delete all rows in the table and re-upload it in its entirety. That's a pretty inconvenient way to update individual entries in individual rows.
I don't entirely what alpha mode entails, but I believe it lets you test out features that aren't fully tested yet. You activate by going to the bottom right corner while logged into synapse and clicking the green helmet button.
Thanks -- that didn't solve my problem. I'll ask it elsewhere.
@jaeddy how should we handle multiple synapse IDs in grantId? This is coming up for me in annotating?
Is there an entity list? Should we use a string list? Or just a string? The latter would assume we are accessing this field using LIKE, I believe.
grantName is currently a string. It, too, will have multiple grants. Should this be changed to string list? Probably. There are 3 datasets that show up as having grantName "Center for Cancer Systems Therapeutics (CaST), Columbia University Center for Topology of Cancer Evolution and Heterogeneity" -- whereas they should instead be associated with the two grants "Center for Cancer Systems Therapeutics (CaST)" and "Columbia University Center for Topology of Cancer Evolution and Heterogeneity"
@bswhite - the way I've been treating multi-value columns is to only convert to List
-types for things we need to facet; for everything else, I'm using a standard comma-separated string (or "|" for institutions, which may have a comma in their name).
You raise a good point though that, even if we're not exposing a column as a facet for viz/filtering on the explore page, we might still use it for querying under the hood (e.g., to link to details pages). If we can identify those cases, I can help convert grantId
and others to stringList
(there isn't currently an entityList
type, but I think string should still be fine).
For grantName
, I haven't made those true lists yet because they're so long. The way Synapse estimates maximum row size is currently dependent on just the length of strings in the array — and so blows up pretty quickly. Ziming has added a fix that also lets you specify maxListLength
for a column as well, so the estimates are more reasonable. Still, it'd probably be good to make the change in #34 before we try to use grants in lists.
Fixed grantId
column in the merged table for datasets with multiple grants. Also made the grant
(aka grantNumber
) a STRING_LIST
and facet, with #34 in mind.
I added dataset <-> grant associations to this table.
These datasets have two consortium ids:
Is that correct?