open-data-rescue / climate-data-rescue

Climate Data Rescue is an archival data rescue platform using Ruby on Rails.
https://citsci.geog.mcgill.ca
MIT License
14 stars 9 forks source link

Slow Loading of Admin>Pages #298

Closed raakal closed 3 years ago

raakal commented 3 years ago

The Admin page 'Pages' (https://citsci.geog.mcgill.ca/en/admin/pages) is extremely slow to load as it is loading all of the images existing on the website/app. It currently times out and or fails to work at all or lags.

This does not occur to this degree on the user side ('All Pages' https://citsci.geog.mcgill.ca/en/pages), as it is only loading all available pages to transcribe. There is still some slowness, but nothing unmanageable.

A potential fix for this could be pagination.

An aside: It would be beneficial for the column headings to move with the page so that when you scroll down you still know which column is what.

rsmithlal commented 3 years ago

@raakal I made some changes to make it faster to load the list of pages and transcriptions and this has currently been deployed. The transcription list loads faster now. I am using counter columns for the values that are used for the percent complete and had to use some automated processes to make sure that the values are kept in sync with the actual values used previously. Please keep an eye on the percent complete values and let me know if things start to seem off.

It still takes far too long to load the pages because there are around 6000 total and it tries to load them all. Pagination is the only feasible solution for this. I'm going to set the default page size to 10 records, but that leaves 600 pages. I will have to combine pagination with some sorting and filtering options so you can meaningfully narrow down the result set. Out of the following table headers, which ones would make the most sense to be able to sort or filter on? How have you or would you like to work with the table of pages?

ID |  Filename | Height | Width | Page Schema | Metadata | Date Uploaded | Transcriptions | Visible To Transcribers | Page Completed

Metadata and Transcriptions columns would have to be a presence or absence filter. I can also add columns specifically for the start and end date so that you could sort or filter on those values.

VickyS08 commented 3 years ago

For my 2 cents:

Filtering by 1. filename, 2. by completed, 3. by transcriptions, 4. by metadata, 5 by page schema, 6. by visible to transcribers

would be helpful.

If it helps makes it faster, we don’t really need height and width here, and date uploaded, while helpful, could maybe be accessed through Metabase or other if needed. I’m not sure taking these out this will make things much faster though.

We will eventually have around 10,000 pages. This may have also been behind our idea of having a 2nd dB for post-processing, to hive off some of the completed and make it faster? I really can’t remember now.

Because we will have to potentially access all pages to look up and “correct” occasional entries. Hopefully not many, but we won’t know beforehand which ones those will be.

Thanks!

Vicky

On Nov 3, 2020, at 12:13 PM, Robert Smith notifications@github.com wrote:

@raakal https://github.com/raakal I made some changes to make it faster to load the list of pages and transcriptions and this has currently been deployed. The transcription list loads faster now. I am using counter columns for the values that are used for the percent complete and had to use some automated processes to make sure that the values are kept in sync with the actual values used previously. Please keep an eye on the percent complete values and let me know if things start to seem off.

It still takes far too long to load the pages because there are around 6000 total and it tries to load them all. Pagination is the only feasible solution for this. I'm going to set the default page size to 10 records, but that leaves 600 pages. I will have to combine pagination with some sorting and filtering options so you can meaningfully narrow down the result set. Out of the following table headers, which ones would make the most sense to be able to sort or filter on? How have you or would you like to work with the table of pages?

ID | Filename | Height | Width | Page Schema | Metadata | Date Uploaded | Transcriptions | Visible To Transcribers | Page Completed

Metadata and Transcriptions columns would have to be a presence or absence filter. I can also add columns specifically for the start and end date so that you could sort or filter on those values.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/open-data-rescue/climate-data-rescue/issues/298#issuecomment-721263277, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENOOAIU5M5ZMLGV6VU3VQDSOA24RANCNFSM4THYY6MQ.

raakal commented 3 years ago

Thanks for doing this @rsmithlal !

For pagination, would 15 or 20 records per page work? Or will that still overload the system?

In terms of what to sort/filter by, I think the ones which Vicky highlighted are our most important, especially Metadata, Filename, Visible to Transcribers and Page Completed.

To put it into context, this page is often used to check and update Metadata by one of our students (does it have metadata, is it visible to the transcriber, etc.), to check if a page has already been uploaded to the site to ensure no duplications, and to check if a page has been completed by a user (checking against Metabase).

Will we be able to filter multiple options at once? Or will we be only able to filter by one option at a time?

Otherwise looks good!

rsmithlal commented 3 years ago

@VickyS08 @raakal I'm working on building a new javascript-based paginated and filterable admin data table for the pages. Will be trying to recreate or improve the data displayed in the existing table where possible.

Here is a screenshot sample of what I'm building for your feedback.

image

raakal commented 3 years ago

Hi @rsmithlal This looks great! To clarify, I would be able to type a specific thing in the white boxes at the top and bottom (like ID and Start Date) in order to search?

rsmithlal commented 3 years ago

Yes, that's right! I also have a boolean (yes/no) filter for the visible and complete columns, and will add those for metadata and transcriptions as well as a dropdown filter for schema.

raakal commented 3 years ago

It looks perfect then! I like that we will be able to filter by certain criteria and then also have the ability to page thru. I think it will really help with the slow loading that's been going on.

As an aside, is it possible to do something similar to the user accessible All Pages? Maybe not the intensive filtering which we have, but perhaps pagination? I'm not sure how much extra work this would be of course.

VickyS08 commented 3 years ago

That looks great. I’ve noticed you’ve been working hard and was torn between saying “great work” and “take a break during the holidays”!

One thing I need to check frequently for admin/ typing speed estimates/ data entry estimates is the number of data entries on a completed page. Would this show up under “Transcriptions”?

Not sure if this would be on the same table or in a different page, but I also need to cross-check the hours on a transcriber’s invoice with the pages transcribed, date transcribed and number of fields transcribed to get an idea of costings (how kong it takes, whether different pages/page types take longer, etc). Could these fields be added or should they be on a different page? Either way, if they can be searched and filtered too, that would be fantastic as well.

As an aside, because of difficultly with my hands and with typing, I often find it difficult to type in capital letters, Rob, due to holding down the shift key and a letter key at the same time. This is why many fields I create don’t always start with capitals. Let’s discuss standards some time so we’re all doing the same thing with new fields, help text, etc.

Vicky

On Dec 30, 2020, at 12:41 PM, raakal notifications@github.com wrote:

It looks perfect then! I like that we will be able to filter by certain criteria and then also have the ability to page thru. I think it will really help with the slow loading that's been going on.

As an aside, is it possible to do something similar to the user accessible All Pages? Maybe not the intensive filtering which we have, but perhaps pagination? I'm not sure how much extra work this would be of course.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-data-rescue/climate-data-rescue/issues/298#issuecomment-752702468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENOOAOTSQZJ64ZF3SY3HULSXNQ5LANCNFSM4THYY6MQ.

rsmithlal commented 3 years ago

Thanks, @VickyS08! Holidays are the only time that I get big chunks to dedicate my my own projects, so I can't help but dive into them when I'm lucky enough to have time off!

I created another issue to address your request to add some business-related metrics to the transcriptions list. I will include that when I redo the transcriptions table to use this new mechanism.

Thanks for letting me know about your experience with data entry. Let's discuss that soon and figure out where we can automate some of that work. If all Field names always need to be each word capitalized in English but only the first word capitalized in French, we can handle that when the value is saved.

rsmithlal commented 3 years ago

I started working on a new version of the admin interface built entirely in JavaScript to be able to support the new Pages data table. I was having a bunch of layout issues because of a version conflict.

I've built the new app so it can fit seamlessly with the existing admin interface, and is only activated when visiting the URL to manage the pages. I will expand it to power more Admin pages such as the Transcription list as I go.

Below is a snapshot of what it currently looks like (unfinished). You can access the other admin pages using the old interface from the "Admin" nav dropdown just like you do now.

image

rsmithlal commented 3 years ago

As an aside, is it possible to do something similar to the user accessible All Pages? Maybe not the intensive filtering which we have, but perhaps pagination? I'm not sure how much extra work this would be of course.

@raakal Pagination on that section would be helpful! I'd like to also improve the layout of how we present the data to the user if we make changes to it. If you have any suggestions for how to improve the public-facing page listing, please open a new issue and add them there. It would also be great to have any sketches or mockups that you might create to help me understand your vision for the page.

rsmithlal commented 3 years ago

@raakal @VickyS08 I'm pleased to announce that the new pages table is now completed and deployed to the test site for testing and feedback! https://test.citsci.geog.mcgill.ca/en/admin/pages. Please test it out and let me know what you think over the next few days. If you have any suggestions of the order of the table columns, please let me know. When any suggestions or issues have been dealt with and you're happy with the result, I will deploy a new version to the production app to take care of that nasty page load issue you're having!

There are a couple of known limitations or intentional decisions that you should know about.

First, the navigation bar is a work in progress and does not display the user-facing pages. I'm not sure if I will add them in for the admin interface or not. If it's something that you use a lot and would be beneficial, let me know. This new interface is only active on the pages screen for now, so it shouldn't affect you in the other admin sections such as the custom content pages.

It doesn't display your current user info, but it does provide links to your profile and to log out. The language switcher works well and ensures that the data should load in the right language where appropriate, but the new interface is not yet using translations.

The filters for schema also only display what is currently loaded and not all schemas, but I will address that soon.

I changed how the transcriptions are displayed to avoid having to go fetch the user data. If you find it useful enough to want me to add it back, I can get the user info to display the name of the user who did the transcription.

It's been a long hard slog and lots of speed bumps and road blocks along the way, but I'm really pleased with the result! This new interface will serve as the foundation for future app UI improvements as I slowly convert other sections of the site to use this new technology. I'm intentionally replacing the admin interface piece by piece before I tackle the public-facing interface, as it will allow us to work out the kinks and build up a larger library of reusable components before taking a deep dive to replace the public interface.

Happy new year!!

image

VickyS08 commented 3 years ago

@Rob, Sorry, quick note to say I’ve been sick (hopefully just migrane) so off computer the past few days. Just checking in with emails. Not sure I’ll be able to look at screens before everyone’s back to work Monday.

On Dec 31, 2020, at 5:41 PM, Robert Smith notifications@github.com wrote:

@raakal https://github.com/raakal @VickyS08 https://github.com/VickyS08 I'm pleased to announce that the new pages table is now completed and deployed to the test site for testing and feedback! https://test.citsci.geog.mcgill.ca/en/admin/pages https://test.citsci.geog.mcgill.ca/en/admin/pages. Please test it out and let me know what you think over the next few days. If you have any suggestions of the order of the table columns, please let me know. When any suggestions or issues have been dealt with and you're happy with the result, I will deploy a new version to the production app to take care of that nasty page load issue you're having!

There are a couple of known limitations or intentional decisions that you should know about.

First, the navigation bar is a work in progress and does not display the user-facing pages. I'm not sure if I will add them in for the admin interface or not. If it's something that you use a lot and would be beneficial, let me know. This new interface is only active on the pages screen for now, so it shouldn't affect you in the other admin sections such as the custom content pages.

It doesn't display your current user info, but it does provide links to your profile and to log out. The language switcher works well and ensures that the data should load in the right language where appropriate, but the new interface is not yet using translations.

I changed how the transcriptions are displayed to avoid having to go fetch the user data. If you find it useful enough to want me to add it back, I can get the user info to display the name of the user who did the transcription.

It's been a long hard slog and lots of speed bumps and road blocks along the way, but I'm really pleased with the result! This new interface will serve as the foundation for future app UI improvements as I slowly convert other sections of the site to use this new technology. I'm intentionally replacing the admin interface piece by piece before I tackle the public-facing interface, as it will allow us to work out the kinks and build up a larger library of reusable components before taking a deep dive to replace the public interface.

Happy new year!!

https://user-images.githubusercontent.com/11048570/103427312-26017100-4b8e-11eb-8584-38532a3911eb.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-data-rescue/climate-data-rescue/issues/298#issuecomment-753219733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENOOAMVQL6UNBO7G5GEMETSXT42PANCNFSM4THYY6MQ.

VickyS08 commented 3 years ago

It looks great, a lot of very hard work went into it! I like it a lot and it will make things much easier.I can only look very briefly for health reasons, so apologies for what seem to be very curt remarks on a wonderful job.

A couple of very quick comments:

  1. I think it’s ok not to have the user navigation bar, but maybe an easy link somewhere to the main site could be helpful. I/We often use this page also to help users and the track down problems they’re having with their transcriptions or pages, so a link to the help and FAQs is sometimes useful.

  2. Do we need both date created and date updated? This seems to take up some “eyeball" real estate and I’m not sure how useful it is - maybe just updated is enough. Same for title and file name -I would only keep one . Rachel, thoughts? Maybe we can bring this to a meeting with Brittany and Jaz.

  3. As I mentioned before, I do need user ID but I have been getting it through the transcriptions page, which means I usually have several tabs open and switch between them to collect the information I need, not just for “business” but also to help users who are having difficulties. As we discussed, this could maybe be better added to the transcriptions page. Need to think about it.

On Dec 31, 2020, at 5:41 PM, Robert Smith notifications@github.com wrote:

@raakal https://github.com/raakal @VickyS08 https://github.com/VickyS08 I'm pleased to announce that the new pages table is now completed and deployed to the test site for testing and feedback! https://test.citsci.geog.mcgill.ca/en/admin/pages https://test.citsci.geog.mcgill.ca/en/admin/pages. Please test it out and let me know what you think over the next few days. If you have any suggestions of the order of the table columns, please let me know. When any suggestions or issues have been dealt with and you're happy with the result, I will deploy a new version to the production app to take care of that nasty page load issue you're having!

There are a couple of known limitations or intentional decisions that you should know about.

First, the navigation bar is a work in progress and does not display the user-facing pages. I'm not sure if I will add them in for the admin interface or not. If it's something that you use a lot and would be beneficial, let me know. This new interface is only active on the pages screen for now, so it shouldn't affect you in the other admin sections such as the custom content pages.

It doesn't display your current user info, but it does provide links to your profile and to log out. The language switcher works well and ensures that the data should load in the right language where appropriate, but the new interface is not yet using translations.

I changed how the transcriptions are displayed to avoid having to go fetch the user data. If you find it useful enough to want me to add it back, I can get the user info to display the name of the user who did the transcription.

It's been a long hard slog and lots of speed bumps and road blocks along the way, but I'm really pleased with the result! This new interface will serve as the foundation for future app UI improvements as I slowly convert other sections of the site to use this new technology. I'm intentionally replacing the admin interface piece by piece before I tackle the public-facing interface, as it will allow us to work out the kinks and build up a larger library of reusable components before taking a deep dive to replace the public interface.

Happy new year!!

https://user-images.githubusercontent.com/11048570/103427312-26017100-4b8e-11eb-8584-38532a3911eb.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-data-rescue/climate-data-rescue/issues/298#issuecomment-753219733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENOOAMVQL6UNBO7G5GEMETSXT42PANCNFSM4THYY6MQ.

rsmithlal commented 3 years ago

Thanks for your feedback @VickyS08! I hope you feel better soon.

Is there anything that you have found that would be a blocker for releasing this to the production app? The adjustments and tweaks can be done afterwards if they are not preventing you from doing the work you need to do.

To address point 1, you can access the root url of the site by clicking on the draw logo in the nav bar.

2, at the moment the default sort for the pages list is on the created_at column, so if we remove the column we will have to determine another column to use for the default sort. We can definitely drop the title column if it's not useful for you. The information is available in the filename, start date, and end date columns.

3, the transcriptions page and table is the next one on the block for the new interface, so any suggestions for improvements to that page should be made in another issue so we can keep things in topic and address the discussion and work in a single place.

There is also the potential for future improvements to these tables to be able to toggle columns on and off, if that's helpful.

raakal commented 3 years ago

@rsmithlal Looks great to me! I tested a few of the functions and I think it works for what Jaz and Brittany would use it for.

1. I don't see a need for a link to the FAQs if you can click the DRAW logo to get back to the main area of the site.

2. While the created_at column might not be needed for the sort of work we usually do, I don't think we necessarily need to remove it either. I think Title should remain as that will likely be where I search for the year and date of files (especially if I don't know the ID because its separate from a user related issue)

3. I'll see about creating a new issue for the things brought up here unrelated to this issue.

Overall, I think it looks good for releasing this to the production app. As you said, we can tweak things as needed since we are using it behind the scenes!

rsmithlal commented 3 years ago

Awesome, thanks @raakal!

rsmithlal commented 3 years ago

This has now been deployed to the production/live site! I will post about it in Slack in the morning.

New to-dos after review:

rsmithlal commented 3 years ago

I'm going to close this issue now that the speed problem has been resolved. I will create a new issue to discuss improvements to the page.