rism-digital / muscat

🗂️ A Rails application for the inventory of handwritten and printed music scores
http://muscat-project.org
34 stars 16 forks source link

Inaccurate figures in Statistics #1243

Closed xhero closed 1 year ago

xhero commented 2 years ago

The statistics page sometimes shows some slightly off figures in the Source count for users. This is due on how this data is indexed in Muscat:

https://github.com/rism-digital/muscat/blob/6cb079d42a3daf26b4e1c8cb931248dd84ae5480/app/models/source.rb#L248-L250

and

https://github.com/rism-digital/muscat/blob/6cb079d42a3daf26b4e1c8cb931248dd84ae5480/app/models/source.rb#L254-L256

Now this comes in handy when doing searches in the "sources" page but since all these values are collated together, it is not possible to associate a single modification to a single date by a use. For example user "182", has two "phantom" sources in May and June 2021. These sources are 990008602 and 991021277. Looking closely they both have a Holding by user 182 and another holding that was created in one of these two dates, causing the false hit. The easiest solution is to restrict the view to only Sources that are created, and possibly have a second one for Holdings. In this case we can have specific index values for created_at and user that will not overlap. I already tested this and it works. If on the other hand we need to count the Holdings or Sources, we will need a bit of tweak to the index to make it work.

jenniferward commented 2 years ago

Personally I'd like to know when people create holdings as well. Maybe having it separate has an advantage, because you can see where the efforts are going (new records/holdings).

xhero commented 2 years ago

We could have 4 tabs, in the table:

Sources per User | Holdings per User | Sources per Workgroup | Holdings per Workgroup

BaMikusi commented 2 years ago

I agree that it would be nice to have a more refined overview about which user is engaged in what sort of cataloguing -- this should be helpful for us editors to better understand the work going on, but arguably also for the catalogers who need to assess their own work. So, if we can show results for 'Sources' and 'Holdings' separately, and also filter out the 'background noise' resulting from the activities of other catalogers, that should be great.

At the same time, I am a bit uncertain whether the info available in Muscat regarding working groups is precise enough to allow for meaningful groupings -- in the present statistical summary, for instance, we have categories like "D-" vs. "Germany", or "PL-" vs. "Poland" and even "Polen", and so it remains a bit vague precisely which catalogers should be considered as belonging to the same working group. (For the annual statistics I as a rule download the spreadsheet and transform it in diverse ways manually to come to a meaningful result.)

jenniferward commented 1 year ago

"Sources per User" and "Holdings per User" are now available in the Statistics.
@BaMikusi Do we still want tabs for sources/holdings per working group, or is the spreadsheet enough?

BaMikusi commented 1 year ago

Sorry, this question has been snowed under a bit... But my longer comment above actually meant to imply that there was no need to have specific statistical functions for the working groups, since the borderlines have become so blurred that I do the addition on the basis of the individual spreadsheet anyway. So, what we really needed (Sources per User / Holdings per User) we do have now, and the rest that came up as a potential further development seems not necessary. With this summary I encourage Rodolfo to close this (for I cannot).