Implement QB-like interface

grantfitzsimmons commented 2 years ago

KU Herpetology would like to allow users to search Preparations in a way that can be specified more accurately. Due to the way the web portal works, this is not possible currently.

It would be nice to tell the advanced search that only the following values will be found in this data {ETOH, tissue, etc.}.

Current workaround is to create a splash image that details how to search the data, but that is not very clean.

maxpatiiuk commented 2 years ago

For future reference: After talking about this more, @grantfitzsimmons suggested to replace Web Portal with Specify 7.

Benefits:

Less code to maintain
Fewer services to setup
All new features added in Specify 7 are automatically in Web Portal
No need to export the data, as the data is already in Specify 7

We identified the following missing features that must be implemented before this is:

Add special search capability (fixed. https://github.com/specify/specify7/issues/1713)
Ability to plot query results on a map (fixed. https://github.com/specify/specify7/issues/1714)
Add the ability to see attachments in Query Builder (https://github.com/specify/specify7/issues/1711)
Add record level permission system, so that can restrict the access to a single record set (https://github.com/specify/specify7/issues/1957)

Concerns:

Some web portals are currently pulling the data from different databases. Need to find out how common it is
Usability is an issue. Query Builder is not appropriate for non database crowd, the full text indexing or simple list of fields to fill out is, like the way we have it now. I think building a web portal with a QB interface would be geeky, powerful, but wrong.

beach53 commented 2 years ago

Related Issues at the Institutional and Community Levels:

National and International Aggregators are already serving web portal functions. GBIF is no hosting customized web portals with GBIF software for collections and projects. [https://www.gbif.org/news/5D3ijLXMbpiZDBj0y0z1J/gbif-launches-hosted-portal-service]
Collaborative projects (Symbiota databases) prefer thematic web portals to highlight research interests and specialties.
Only a minority of Specify collections use our portal, mostly smaller places who have no better options, or who want us to host them for lack of inhouse IT support or due to campus security concerns.
What is the function of a collections web portal? -- Promote awareness the existence and vitality of a collection (marketing) "Look at our nice web site!" -- Provide a mechanism for researchers to see what is in the collection to promote use of specimens -- To identify collections strengths, for internal usage, seeing on a map where own collections are located.

mcruz-umich commented 2 years ago

We at U-M are about to launch a bunch of these sites. We just got "Fishes" up today: http://ummz.fishes-specify-portal.apps.gnosis.lsa.umich.edu/

And plan to do one for Birds, Mammals, Mollusks, Insects and more!

The S7 replacement idea is interesting, however the Specify Web Portal does have some interesting things in it that we would need to see in S7.

Does S7 have the "Map" feature that shows the locations of all the occurrences on a map?
Our portals are currently public, so if we move them from the web portal code to the S7 code and we use SSO in S7, then will we have a problem with security? I suppose it also depends on if the S7-driven version of the web portals runs off of a cache or directly accesses the DB in read-only mode.
One advantage to the web portal vs S7 is containers. I currently am deploying one Specify Web Portal per container. This means traffic / load is focused on just one container. If we move to S7 for these apps, then would we have the combined load of the Collection Managers doing workbench and heavy query operations along side the general public doing read-requests of these publicly published sets of data? I would think we want to keep the one-app-per-container focus.....for performance reasons, and security too.

Thanks for your consideration and working on these ideas.

Sincerely, Matthew

mcruz-umich commented 2 years ago

I have developed the following process for removing "Tissues" from our publications.

It involves downloading the zip file from DataExporter, then unzipping it, modifying the csv, and then re-zipping it and deploying.

There is a bug in the "Schema Mapper" in which it checks for duplicate records BEFORE running respecting "distinct" checkbox. This bug therefore means you cannot query at the gift or preparation level and then reduce multiple rows down to a single row for the same Cat # even if you have the columns set to "do not show". The "Schema Mapper" logic needs to process the "distinct" checkbox FIRST before checking for duplicates. Since that is not working in 6.8.00.... I have developed the following process for removing gift records and stripping "Tissue - N" from the aggregated preparations field.

OPENOFFICE is used below to leverage a visual view of the csv data such that rows can be sorted and deleted easily.

SP DATAEXPORTER: Export from DataExporter - this will include gifts and empty preparations
OPENOFFICE: Open the csv in OpenOffice, sort by "preparations" and delete rows that have none
SPECIFY: Run a query to get a list of the gifts
SUBLIME TEXT: Open the csv in SublimeText and search for the gift numbers and delete those rows REGEX: ,(gift_num_1|gift_num_2|gift_num_3|...gift_num_n),
SUBLIME TEXT: Search for "tissue" prepTypes and remove the string but keep the row. This will affect the Preparations-aggregation column and may result in a blank cell if it had only contained tissue REGEX: (tissue( - \d)?;? ?)|(; tissue( - \d)?)+
Open in OpenOffice again and sort by Catalog Number

maxpatiiuk commented 2 years ago

We at U-M are about to launch a bunch of these sites. We just got "Fishes" up today: http://ummz.fishes-specify-portal.apps.gnosis.lsa.umich.edu/

And plan to do one for Birds, Mammals, Mollusks, Insects and more!

The S7 replacement idea is interesting, however the Specify Web Portal does have some interesting things in it that we would need to see in S7.

Does S7 have the "Map" feature that shows the locations of all the occurrences on a map?

Our portals are currently public, so if we move them from the web portal code to the S7 code and we use SSO in S7, then will we have a problem with security? I suppose it also depends on if the S7-driven version of the web portals runs off of a cache or directly accesses the DB in read-only mode.

One advantage to the web portal vs S7 is containers. I currently am deploying one Specify Web Portal per container. This means traffic / load is focused on just one container. If we move to S7 for these apps, then would we have the combined load of the Collection Managers doing workbench and heavy query operations along side the general public doing read-requests of these publicly published sets of data? I would think we want to keep the one-app-per-container focus.....for performance reasons, and security too.

Thanks for your consideration and working on these ideas.

Sincerely, Matthew

We added ability to do spatial search (https://github.com/specify/specify7/issues/1713) and the ability to plot query results on a map (https://github.com/specify/specify7/issues/1714). Those features would be included in one of the future releases.
The new Specify 7 security & permissions system should help. You can set up anonymous user access, and set some permissions for that user. For more complicated use cases, you could probably resort to the current workflow of making a regular dump of data and importing that data into a separate, public Specify 7 instance.
Getting a more powerful machine might help here. Also, read-only access to Specify 7 should not lead to very high CPU usage. The WorkBench is the most performance hungry tool, which won't be accessible to read-only users. Though, the second most power hungry might be the query builder. A similar solution of maintaining a separate Specify 7 instance for the public can be used.

While maintaining separate internal and public Specify 7 instance adds complication, it is not that different that the current strategy of having both web portal and Specify 7. The major benefit of replacing Web Portal with Specify 7 is that Specify 7 is already far more capable by most metrics and will only get more capable as it is the sole development focus.

mcruz-umich commented 2 years ago

maintaining a separate Specify 7 instance for the public can be used Great idea!

While maintaining separate internal and public Specify 7 instance adds complication, it is not that different that the current strategy of having both web portal and Specify 7. The major benefit of replacing Web Portal with Specify 7 is that Specify 7 is already far more capable by most metrics and will only get more capable as it is the sole development focus. I am now in agreement!

specify / webportal-installer

Implement QB-like interface #65