openwebwork / webwork2

Course management front end for WeBWorK
http://webwork.maa.org/wiki/Main_Page
Other
141 stars 164 forks source link

Library Browser showing OPL problem multiple times #2387

Open Alex-Jordan opened 3 months ago

Alex-Jordan commented 3 months ago

I see the following in both production and my develop server.

  1. Go to Library Browser.
  2. Click Advanced Search for the OPL
  3. For "Textbook", select "Calculus: Concepts and Contexts by James Stewart (edition 5)"
  4. For "Text chapter" choose "12. Vector geometry"
  5. For "Text section" choose "1. Coordinate systems"

At this point there should be 17 matching problems. Click "View Problems". I see the first problem, Library/UMN/calculusStewartET/s_12_1_10.pg, three times. I also see Library/UMN/calculusStewartET/s_12_1_9.pg three times. And there are more examples. So far, I only see this with UMN problems.

drgrice1 commented 3 months ago

It seems that WeBWorK 2.18 also does this. So this is not a regression with develop. Of course it is a bug nonetheless.

Alex-Jordan commented 3 months ago

I see it also in a different search with Library/Rochester/setVectors1space3D/UR_VC_1_2.pg, so it is not just UMN problems.

drgrice1 commented 3 months ago

So this is probably technically an OPL bug, and not a webwork2 bug. The issue is a structural flaw in the design of the OPL database that goes back to 2014 when cross listing of problems in different subject areas was added to the Taxonomy2. That implementation was not well thought out. It requires that at least the DBsubect be selected in order to uniquely identify the cross listings of a single problem.

So what is happening for the particular example in your first comment in this issue is that the Library/UMN/calculusStewartET problems for the textbook "Calculus: Concepts and Contexts by James Stewart (edition 5)" and subject "12. Vector geometry" and section "1. Coordinate systems" are all cross listed for "Calculus - multivariable", "Linear Algebra", and "Geometry", and so are each stored in the OPL_pgfile table three times (one for each cross listed subject). Since none of the top three selects in the advanced tab which correspond to DBsubject, DBchapter, or DBsection are selected and we are filtering only by textbook, textchapter, and textsection, that does not uniquely identify the cross listing, and so you get all three of each problem.

To see this, select the text book, chapter, and section you mentioned. At this point it shows 24 matching WeBWorK problems. Now select "Calculus - Multivariable" for the database subject (the first select on the advanced tab). Then it drops to 8 matching WeBWorK problems. If you select "Linear Algebra" it also matches 8, and the same for "Geometry". Initially each problem is listed 3 times because it is cross listed in those 3 subjects. Selecting one of the subjects eliminates the cross listings.

To fix this the OPL database will either need to be redesigned to properly handle cross listings, or the database queries in the getDBListings will need to be revised. To obtain the most efficient result, the database needs to be redesigned, but revising the database queries in getDBListings works with some loss of efficiency. I am working on finding queries that work relatively well in all cases, but am not quite there yet.

Note that we have been lying about how many problems are in the OPL and Contrib since this already affects the counts when no database subject is selected. On the develop branch with the v2023-04-30 OPL release it shows 37,515 problems in the OPL and 11,087 problems in Contrib. I used a revised database query to find the correct numbers, and there are actually only 29,763 problems in the OPL, and 7,916 in Contrib.