koniu / recoll-webui

web interface for recoll desktop search
266 stars 55 forks source link

Recoll WebUI returns 0 results when search folder other than <all> is chosen #23

Closed hagfelsh closed 10 years ago

hagfelsh commented 10 years ago

Hi! Marvelous work on this project, I just discovered recoll & webui and am absolutely delighted at its power.

I discovered something odd today, though, while using the webUI: when constraining the search scope to a subdirectory (rather than ), the search will return 0 results, in every case, in every directory. The Recoll search GUI itself will return expected results.

Searching for the same term from will properly find the search terms, even from within the directories that return 0 when searched exclusively.

I've reinstalled both, just in case there was a problem there somewhere, but the problem is easily duplicated.

I'm using Fedora 20 x64, Recoll 1.19.19 + Xapian 1.2.15. WebUI version was whatever was up on January 18th, about 12 GMT. The web browser is Firefox 25.0. I also tried it in chrome 32.0.1700.76 m with the same results.

The searched material is on a CIFS share mounted on the Fedora machine. WebUI is started from the same account that owns the index (non-root).

Here is a comparison of a search for "balance" in a folder called ENG Doc Control/Docs. The base of the searched directories is mw-ksb.

recoll query: (((balance:(wqf=11) OR balancing OR balanced OR balancer OR balances OR balancers) AND (XP PHRASE 4 XPmw-ksb PHRASE 4 XPENG Doc Control PHRASE 4 XPDOCs)))

webui query: "GET /results?query=balance&dir=mw-ksbENG+Doc+ControlDOCs&after=&before=&sort=relevancyrating&ascending=0&page=1 HTTP/1.1" 200 9665

The web query in this example returned 0 results, while the Recoll UI, which was constrained to the same subfolder, returned 239.

Here's what the webUI returns when set to for the same search term:

"GET /results?query=balance&dir=%3Call%3E&after=&before=&sort=relevancyrating&ascending=0&page=1 HTTP/1.1" 200 70209

Please let me know if I can provide more information. Your help is greatly appreciated!

ghost commented 10 years ago

Hi,

I'll assume that you are using Recoll 1.19.9 as .19 is not yet there :)

The query encoding seems wrong on the failing query: there should be some "%2F" pieces for the / separators. Is this a github effect or are they really missing when printed on the terminal ?

An initial try at reproducing this failed (I do get a correctly encoded query and results), but I can try harder once I know that the / characters are really missing.

GET /results?query=sac&dir=d%2Fdir+with+blanks&after=&before=&sort=relevancyrating&ascending=0&page=1 HTTP/1.1"

hagfelsh commented 10 years ago

Oops... version from the future!

I did my best to read the webui code that generates the query, but my python isn't very strong. I ran diff against the webui.py that's in the zip & what is running on my machine and the files are identical.

Is there a debug flag I can set to create more verbose logging for you? The query string I posted for the webui was from stdout, which seems to be where the app sends its logging info.

ghost commented 10 years ago

This is weird. In the query, there should be slash characters encoded as %2F. Instead, they seem to be suppressed. In other terms, "dir=mw-ksbENG+Doc+ControlDOCs" should look like "dir=mw-ksb%2FENG+Doc+Control%2FDOCs" instead (and the capitalization of "DOCs" is weird too by the way).

I think that it would be interesting to have a look at the generated HTML by using "show page source". I am especially interested by the "folders" section, the part which looks like:

 <b>Folder</b><br>
        <select id="folders" name="dir">
                <option style="margin-left: 0em" value="&lt;all&gt;">&lt;all&gt;</option>
                <option style="margin-left: 0em" selected value="d">d</option>
                <option style="margin-left: 2em" value="d/d1">d1</option>
                <option style="margin-left: 4em" value="d/d1/d3">d3</option>
                <option style="margin-left: 2em" value="d/d2">d2</option>
                <option style="margin-left: 2em" value="d/dir with blanks">dir with blanks</option>
                <option style="margin-left: 2em" selected value="d/ENG Doc Control">ENG Doc Control</option>
                <option style="margin-left: 4em" selected value="d/ENG Doc Control/Docs">Docs</option>
        </select><br>

(Put the data between 2 lines with 4 backquotes to prevent interpretation by GitHub, have a look at the "Markdown" link above).

hagfelsh commented 10 years ago

Oo! The capitalization in "DOCs" is correct--that's how the directory is named, for whatever reason.

Looking at your example HTML, I wonder if this has anything to do with this index being of a CIFS share mounted on a Windows file server...?

Here's the source of the page after searching for "balance" with 0 results returned, from the same directory as in my original post. I fouled up the names of the folders, but kept any spaces or special characters that were present in the original folder names. The only non letters there were are ( ) - . and _


<body>
<div id="fade"></div>
<div id="searchbox">
<form action="results" method="get">
<table id="form">
<tr>
    <td width="50%">
        <b>Query</b>
        <input tabindex="0" type="search" name="query" value="balance" autofocus><br><br>
        <input type="submit" value="Search">&nbsp;
        <a href="./" tabindex="-1"><input type="button" value="Reset"></a>&nbsp;
        <a href="settings" tabindex="-1"><input type="button" value="Settings"></a>
    </td>
    <td width="30%">
        <b>Folder</b><br>
        <select id="folders" name="dir">
                <option style="margin-left: 0em" value="&lt;all&gt;">&lt;all&gt;</option>
                <option style="margin-left: 0em" selected value="mw-ksb">mw-ksb</option>
                <option style="margin-left: 0em" value="mw-ksbDocumentation">mw-ksbDocumentation</option>
                <option style="margin-left: 0em" value="mw-ksbDocumentationXXXX">mw-ksbDocumentationXXXX</option>
                <option style="margin-left: 0em" value="mw-ksbDocumentationXXXX">mw-ksbDocumentationXXXX</option>
                <option style="margin-left: 0em" value="mw-ksbDocumentationXxxxxxxxx">mw-ksbDocumentationXxxxxxxxx</option>
                <option style="margin-left: 0em" value="mw-ksbDocumentationXxxxRelease">mw-ksbDocumentationXxxxRelease</option>
                <option style="margin-left: 0em" selected value="mw-ksbENG Doc Control">mw-ksbENG Doc Control</option>
                <option style="margin-left: 0em" selected value="mw-ksbENG Doc ControlDOCs">mw-ksbENG Doc ControlDOCs</option>
                <option style="margin-left: 0em" value="mw-ksbStuff">mw-ksbStuff</option>
                <option style="margin-left: 0em" value="mw-ksbStuffOneNote_RecycleBin">mw-ksbStuffOneNote_RecycleBin</option>
                <option style="margin-left: 0em" value="mw-ksbHTTP_Save">mw-ksbHTTP_Save</option>
                <option style="margin-left: 0em" value="mw-ksbMisc">mw-ksbMisc</option>
                <option style="margin-left: 0em" value="mw-ksbPt Ds">mw-ksbPt Ds</option>
                <option style="margin-left: 0em" value="mw-ksbPt Dscore">mw-ksbPt Dscore</option>
                <option style="margin-left: 0em" value="mw-ksbPt DsEHV">mw-ksbPt DsEHV</option>
                <option style="margin-left: 0em" value="mw-ksbPt Dsmirror">mw-ksbPt Dsmirror</option>
                <option style="margin-left: 0em" value="mw-ksbPt Dsplatform">mw-ksbPt Dsplatform</option>
                <option style="margin-left: 0em" value="mw-ksbPt Dstest">mw-ksbPt Dstest</option>
                <option style="margin-left: 0em" value="mw-ksbPt Pub">mw-ksbPt Pub</option>
                <option style="margin-left: 0em" value="mw-ksbRecent page">mw-ksbRecent page</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageA_HZ">mw-ksbRecent pageA_HZ</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageRemote_Mfg">mw-ksbRecent pageRemote_Mfg</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageD_Hs">mw-ksbRecent pageD_Hs</option>
                <option style="margin-left: 0em" value="mw-ksbRecent paged_ws">mw-ksbRecent paged_ws</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageH_Lo">mw-ksbRecent pageH_Lo</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageJ_Jo">mw-ksbRecent pageJ_Jo</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pagejrs">mw-ksbRecent pagejrs</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageM_Mds">mw-ksbRecent pageM_Mds</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageM_Km">mw-ksbRecent pageM_Km</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageR_En">mw-ksbRecent pageR_En</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageRn_Se">mw-ksbRecent pageRn_Se</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageR_Tp">mw-ksbRecent pageR_Tp</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageRTp">mw-ksbRecent pageRTp</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageS_Pgl">mw-ksbRecent pageS_Pgl</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pagetfl">mw-ksbRecent pagetfl</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageTime">mw-ksbRecent pageTime</option>
                <option style="margin-left: 0em" value="mw-ksbRecent pageW_He">mw-ksbRecent pageW_He</option>
                <option style="margin-left: 0em" value="mw-ksbStandards">mw-ksbStandards</option>
                <option style="margin-left: 0em" value="mw-ksbStandardsServices">mw-ksbStandardsServices</option>
                <option style="margin-left: 0em" value="mw-ksbStandardsStandards (from MXP)">mw-ksbStandardsStandards (from MXP)</option>
                <option style="margin-left: 0em" value="mw-ksbStandardsSAF-TY">mw-ksbStandardsSAF-TY</option>
                <option style="margin-left: 0em" value="mw-ksbSPL">mw-ksbSPL</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR010035003000300">mw-ksbSPLR010035003000300</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR0500350033.114">mw-ksbSPLR0500350033.114</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR050035003600318">mw-ksbSPLR050035003600318</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR050035.1000328">mw-ksbSPLR050035.1000328</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR050035.2100303">mw-ksbSPLR050035.2100303</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR0500360031.413">mw-ksbSPLR0500360031.413</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR060030003500343">mw-ksbSPLR060030003500343</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR060031003300304">mw-ksbSPLR060031003300304</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR060032003200315">mw-ksbSPLR060032003200315</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR060033.10.106">mw-ksbSPLR060033.10.106</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR060034003100374">mw-ksbSPLR060034003100374</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR060034003200312">mw-ksbSPLR060034003200312</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR0600350031.159">mw-ksbSPLR0600350031.159</option>
                <option style="margin-left: 0em" value="mw-ksbSPLR0600350031.195">mw-ksbSPLR0600350031.195</option>
                <option style="margin-left: 0em" value="mw-ksbWsmall">mw-ksbWsmall</option>
                <option style="margin-left: 0em" value="mw-ksbWsmallqw">mw-ksbWsmallqw</option>
                <option style="margin-left: 0em" value="mw-ksbWsmallWsmall">mw-ksbWsmallWsmall</option>
                <option style="margin-left: 0em" value="mw-ksbWsmallwifxt">mw-ksbWsmallwifxt</option>
        </select><br>
        <b>Dates</b> <small class="gray">YYYY[-MM][-DD]</small><br>
        <input name="after" value="" autocomplete="off"> &mdash; <input name="before" value="" autocomplete="off">
    </td>
    <td>
        <b>Sort by</b>
        <select name="sort">
                <option selected value="relevancyrating">Relevancy</option>
                <option value="mtime">Date</option>
                <option value="url">Path</option>
                <option value="filename">Filename</option>
                <option value="fbytes">Size</option>
                <option value="author">Author</option>
        </select><br>
        <b>Order</b>
        <select name="ascending">
                <option value="0" selected>Descending</option>
                <option value="1">Ascending</option>
        </select>
    </td>
</tr>
</table>
<input type="hidden" name="page" value="1" />
</form>
</div>
hagfelsh commented 10 years ago

On a whim, I partially tested the CIFS theory by indexing a local directory. The resulting folder list has a distinctly different visual appearance than that of my CIFS-share directories. Each separate dir has its own line with only its name listed, rather than the concatenation of its name as well as its parents back to root.

Tomorrow, i'll map up a LUN to the recoll machine and copy my index target over so it's stored locally in ext4. I suspect that everything will work as expected...

ghost commented 10 years ago

The / characters are definitely missing. I would like to try and reproduce this, it would make a resolution easier. Could you please tell me precisely how the CIFS share is mounted ? I tried with a vanilla autofs mount and things look normal...

hagfelsh commented 10 years ago

I'm not sure I understand what sort of information you'd like, so let me know if I'm missing something.

The CIFS mount only includes _netdev,ro and is mounted to a mountpoint on /. As far as what is being served, it's a Windows 2008 R2 FileSharing server sharing at full network permissions and full NTFS permissions for the user I'm using to mount it.

ghost commented 10 years ago

Thanks, I was wondering if you could have been using a fuse-based mount. I'll try to reproduce the issue, but I currently have trouble getting Fedora 20 to behave as a Virtualbox guest.

ghost commented 10 years ago

Ok, I can reproduce the problem. I can try to see what happens and look for a possible fix now.

ghost commented 10 years ago

Ok, I think it's fixed. This had nothing to do with the kind of system actually, just the fact that the top dir was directly under root (/). There is a fixed file here: https://github.com/medoc92/recoll-webui/blob/master/webui.py

Please let me know how this works for you and I'll put up a pull request.

Cheers,

jf

hagfelsh commented 10 years ago

Sorry for the delay--github seems to have ceased notifying me of updates to this thread...

Fantastic! It's fixed!

Now, a related question: are the subdirectories supposed to be listed out one after another, or is there supposed to be some sort of visual or treed organization to show the parent/child relationship?

What I'm seeing in the dropdown is just a pure list of each directory name, sorted like this: parentA parentB parentC childA childB childC childAchildA childBchildA

If the above is confusing, it lists all the level 1 directories first, then the level 2 afterward, then 3 & so on.

ghost commented 10 years ago

Ok, this is weird. Here is what my folder menu looks like, it's a representation of the tree:

recoll-webui-folder-menu

This is with a recent firefox. Do you get the same thing with firefox and chrome ?

Cheers,

jf

hagfelsh commented 10 years ago

Looks to be a browser specific problem!

I get just a straight, non-indented pile of words from the following versions: Chrome (running on Windows 7) 32.0.1700.102 m IE (running on Windows 7) 9.0.8112.16421

However, it does display correctly in Firefox 22.0 (running on Win 7) and 25.0 (Running on Fedora). Also on FF ESR 10.0.5 (Running on Centos).

Writing for multiple browsers must be miserable!

ghost commented 10 years ago

Yes, it must be awful, happily enough, I'm more of a desktop programmer...

Anyway, while koniu seems to be away, I have changed the identation method to something ugly but which should work on all browsers (hopefully). The modified file is here: https://github.com/medoc92/recoll-webui/blob/master/views/search.tpl

I'll create a pull request, but I really hope that a nicer solution can be found...

koniu commented 10 years ago

Assuming this is fixed. Thanks medoc

hagfelsh commented 10 years ago

Sorry for my delay in replying, guys--I'll apply this patch and report back.

hagfelsh commented 10 years ago

Fixed!

Thank you guys for your help. This thing is a masterpiece!