specify / webportal-installer

The Specify Web Portal
GNU General Public License v2.0
8 stars 10 forks source link

Implement QB-like interface #65

Open grantfitzsimmons opened 2 years ago

grantfitzsimmons commented 2 years ago
image

KU Herpetology would like to allow users to search Preparations in a way that can be specified more accurately. Due to the way the web portal works, this is not possible currently.

It would be nice to tell the advanced search that only the following values will be found in this data {ETOH, tissue, etc.}.

Current workaround is to create a splash image that details how to search the data, but that is not very clean.

maxpatiiuk commented 2 years ago

For future reference: After talking about this more, @grantfitzsimmons suggested to replace Web Portal with Specify 7.

Benefits:

We identified the following missing features that must be implemented before this is:

Concerns:

beach53 commented 2 years ago

Related Issues at the Institutional and Community Levels:

mcruz-umich commented 2 years ago

We at U-M are about to launch a bunch of these sites. We just got "Fishes" up today: http://ummz.fishes-specify-portal.apps.gnosis.lsa.umich.edu/

And plan to do one for Birds, Mammals, Mollusks, Insects and more!

The S7 replacement idea is interesting, however the Specify Web Portal does have some interesting things in it that we would need to see in S7.

  1. Does S7 have the "Map" feature that shows the locations of all the occurrences on a map?
  2. Our portals are currently public, so if we move them from the web portal code to the S7 code and we use SSO in S7, then will we have a problem with security? I suppose it also depends on if the S7-driven version of the web portals runs off of a cache or directly accesses the DB in read-only mode.
  3. One advantage to the web portal vs S7 is containers. I currently am deploying one Specify Web Portal per container. This means traffic / load is focused on just one container. If we move to S7 for these apps, then would we have the combined load of the Collection Managers doing workbench and heavy query operations along side the general public doing read-requests of these publicly published sets of data? I would think we want to keep the one-app-per-container focus.....for performance reasons, and security too.

Thanks for your consideration and working on these ideas.

Sincerely, Matthew

mcruz-umich commented 2 years ago

I have developed the following process for removing "Tissues" from our publications.

It involves downloading the zip file from DataExporter, then unzipping it, modifying the csv, and then re-zipping it and deploying.

There is a bug in the "Schema Mapper" in which it checks for duplicate records BEFORE running respecting "distinct" checkbox. This bug therefore means you cannot query at the gift or preparation level and then reduce multiple rows down to a single row for the same Cat # even if you have the columns set to "do not show". The "Schema Mapper" logic needs to process the "distinct" checkbox FIRST before checking for duplicates. Since that is not working in 6.8.00.... I have developed the following process for removing gift records and stripping "Tissue - N" from the aggregated preparations field.

OPENOFFICE is used below to leverage a visual view of the csv data such that rows can be sorted and deleted easily.

  1. SP DATAEXPORTER: Export from DataExporter - this will include gifts and empty preparations

  2. OPENOFFICE: Open the csv in OpenOffice, sort by "preparations" and delete rows that have none

  3. SPECIFY: Run a query to get a list of the gifts

  4. SUBLIME TEXT: Open the csv in SublimeText and search for the gift numbers and delete those rows REGEX: ,(gift_num_1|gift_num_2|gift_num_3|...gift_num_n),

  5. SUBLIME TEXT: Search for "tissue" prepTypes and remove the string but keep the row. This will affect the Preparations-aggregation column and may result in a blank cell if it had only contained tissue REGEX: (tissue( - \d)?;? ?)|(; tissue( - \d)?)+

  6. Open in OpenOffice again and sort by Catalog Number

maxpatiiuk commented 2 years ago

We at U-M are about to launch a bunch of these sites. We just got "Fishes" up today: http://ummz.fishes-specify-portal.apps.gnosis.lsa.umich.edu/

And plan to do one for Birds, Mammals, Mollusks, Insects and more!

The S7 replacement idea is interesting, however the Specify Web Portal does have some interesting things in it that we would need to see in S7.

  1. Does S7 have the "Map" feature that shows the locations of all the occurrences on a map?
  2. Our portals are currently public, so if we move them from the web portal code to the S7 code and we use SSO in S7, then will we have a problem with security? I suppose it also depends on if the S7-driven version of the web portals runs off of a cache or directly accesses the DB in read-only mode.
  3. One advantage to the web portal vs S7 is containers. I currently am deploying one Specify Web Portal per container. This means traffic / load is focused on just one container. If we move to S7 for these apps, then would we have the combined load of the Collection Managers doing workbench and heavy query operations along side the general public doing read-requests of these publicly published sets of data? I would think we want to keep the one-app-per-container focus.....for performance reasons, and security too.

Thanks for your consideration and working on these ideas.

Sincerely, Matthew

  1. We added ability to do spatial search (https://github.com/specify/specify7/issues/1713) and the ability to plot query results on a map (https://github.com/specify/specify7/issues/1714). Those features would be included in one of the future releases.
  2. The new Specify 7 security & permissions system should help. You can set up anonymous user access, and set some permissions for that user. For more complicated use cases, you could probably resort to the current workflow of making a regular dump of data and importing that data into a separate, public Specify 7 instance.
  3. Getting a more powerful machine might help here. Also, read-only access to Specify 7 should not lead to very high CPU usage. The WorkBench is the most performance hungry tool, which won't be accessible to read-only users. Though, the second most power hungry might be the query builder. A similar solution of maintaining a separate Specify 7 instance for the public can be used.

While maintaining separate internal and public Specify 7 instance adds complication, it is not that different that the current strategy of having both web portal and Specify 7. The major benefit of replacing Web Portal with Specify 7 is that Specify 7 is already far more capable by most metrics and will only get more capable as it is the sole development focus.

mcruz-umich commented 2 years ago

maintaining a separate Specify 7 instance for the public can be used Great idea!

While maintaining separate internal and public Specify 7 instance adds complication, it is not that different that the current strategy of having both web portal and Specify 7. The major benefit of replacing Web Portal with Specify 7 is that Specify 7 is already far more capable by most metrics and will only get more capable as it is the sole development focus. I am now in agreement!