huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
240 stars 80 forks source link

Ability to use curly and straight quotes interchangeably #4963

Open fnocetti opened 2 years ago

fnocetti commented 2 years ago

Is your feature request related to a problem? Please describe. Some syntaxes for search queries, like those for exact match and matching particular fields imply the use of straight double quotes. e.g: attachments.mimetype:"image/png". From #4816 (and some conversations regarding that issue) it looks like for some users the default quote when using Uwazi is the curly one and it's not as straightforward to detect the mistake, or even type a straight quote instead of a curly one.

Describe the solution you'd like We can consider replacing the curly quotes with straight ones on the fly, since curly ones do not seem to participate in any particular syntax.

iJustErikk commented 1 year ago

I'll be taking this on, any problems? I also think users should be able to interchange within a single query eg. attachments.mimetype:“image/png". This would be helpful in a partial copy-paste, when one does not get the last quote, but types a regular after.

iJustErikk commented 1 year ago

Hi, how should I go about adding a test case for this? Should this go into the spec tests for search or E2E? Right now I am trying to create an entity with a jpg attachment (batman) and search for it. I am not sure if this is the right way to go about testing this.

Update: I am being silly. I ended up writing a mock for this

iJustErikk commented 1 year ago

Could someone add me to this ticket and review my PR please? I am ready to take this to the next level 🚀

fnocetti commented 1 year ago

Hi @iJustErikk , I went through your PR and added a couple comments. @RafaPolit I think this should include an escaping mechanism. What do you think? Per https://lucene.apache.org/core/2_9_4/queryparsersyntax.html the Lucene escape char is \, we should use the same.

RafaPolit commented 1 year ago

I have two questions:

Maybe this transformation can be done only in client to prevent the API from being polluted with such problems and allow direct API calls that actually can discern between the two?

RafaPolit commented 1 year ago

@iJustErikk Maybe hold on on the actual implementation and move forward with the e2e. The e2e will still prove useful whatever approach we decide to implement.

LaszloKecskes commented 1 year ago

@iJustErikk @fnocetti I share Rafa's concerns. On the first glance, going forward with the e2e seems like a good next step, as a form of TTD. Extending the e2e at e2e/suite1/fullTextSearch.test.ts seems like a good candidate for that. Let's try to extend that with our expected use cases and get some properly failing tests (i.e. ones that do not fail because of an error, but fail on a resut not matching the expected result).

Also, @iJustErikk I have assigned you to the issue with Fede and I.

iJustErikk commented 1 year ago

Escaping the quotes would erase the quotes from the query. I do not think that would be a valid query. So I do not think using \" would be necessary. @fnocetti

I think a more general approach would work better. It feels "wrong" to do it like this. I was thinking of doing this from the client, but then this would not work for the backend. Maybe we could have an "escape" flag and pass that into the search endpoint. Then I can make a searchEscape function or something along those lines. @RafaPolit

Thank you @LaszloKecskes for the help with the E2E test location. I'll get to writing the E2E test.

iJustErikk commented 1 year ago

Has anyone run into:

{ "ok" : 0, "errmsg" : "not master", "code" : 10107, "codeName" : "NotWritablePrimary" } 2022-12-07T21:30:15.697-0500 Failed: no reachable servers

I am trying to run the blank-state script. This happens everytime. Surely, it has to hit the master node sometime? I've tried destroying the containers and volumes and rerunning docker compose. And more.

Upon log inspection (mongodb), I get some messages about the replication:

Did not find local replica set configuration document at startup; NoMatchingDocument: Did not find replica set configuration document in local.system.replset

Not sure what this means for me. If it had any issues starting replication, then I am not sure what I could have hit earlier if it was not master...

I have no issue hitting whatever I am hitting:

NETWORK [listener] connection accepted from 172.18.0.1:35338 #4 (1 connection now open)

Any thoughts? I've spent 3 hours too much on trying to figure this out

iJustErikk commented 1 year ago

I'll have to wrap this up soon. I do not know if I can get around this mongodb issue. If I cannot make any progress on that, I'll add what code I have for the E2E tests with steps to finish it so that an able-bodied dev can finish the tests in <20 minutes. I can also discuss/implement a more general escaping solution

LaszloKecskes commented 1 year ago

@iJustErikk I have tried the following steps to check: 0, Removed docker containers, images and volumes to make sure I start from a clean state. 1, git: cloned the fork, entered the folder, checked out the work branch. 2, yarn install 3, docker compose up 3, run rs.initiate() in the mongo shell to initialize the replice set which is set in the docker 4, yarn blank-state

I've encountered a different error on the migration step. The cause was that Uwazi (and specifically, mongoose) does not pick up the docker names for hosts on my ubuntu. To resolve it I've had to reconfigure the replica set host according to https://www.mongodb.com/docs/v4.4/reference/method/rs.reconfig/#mongodb-method-rs.reconfig i.e. I've run in the mongo shell: cfg = rs.conf(); cfg.members[0].host = 'localhost:27017'; rs.reconfig(cfg);

After that, everything worked as expected, so I was not able to reproduce your specific error.

If you are 'on a clock', that is understandable, so don't worry! Please contribute as much as you can, and we will pick it up from there.

iJustErikk commented 1 year ago

Hi, I ran out of time with this :) It was great working with everyone

I have part of the E2E tests: describe('Ability to use curly and straight quotes interchangeably', () => { it('Searchable replacement', async () => { // create entity with batman/other image await createEntityWithSupportingFiles(entityTitle, filesAttachments, webAttachments); // create entity without supporting file // ... await clearAndType(searchBoxSelector, 'attachments.mimetype:“image/jpeg"'); await expect(page).toClick('.fa-search'); await page.waitForSelector('CLASS MEANING ENTITY EXISTS'); // get entity contents // expect them to contain batman and not contain the other entity we created }); // another test for curly containing query that yield nothing- perhaps another type });

I am using createEntityWithSupportingFiles from the entity E2E tests. I would have put these shared behaviors into a "behaviors.ts" file, so any number of E2E tests could call them.

iJustErikk commented 1 year ago

Also to note- @LaszloKecskes your fix helped me with the replication error and fixed the follow issue I had after applying the first fix. I believe these both should be documented- perhaps in the troubleshooting part of the documentation I began in this PR?

fnocetti commented 1 year ago

Hi, I ran out of time with this :) It was great working with everyone

@iJustErikk thanks for your contributions! Somebody will take over from here.