precice / precice.github.io

The website of preCICE
https://precice.org/
12 stars 59 forks source link

Updating the Algolia search index: Record is too big #388

Open MakisH opened 5 months ago

MakisH commented 5 months ago

Trying to update our Algolia search index, I got the following error:

Rendering to HTML (100%) |====================================================================================================================================================================================|
[✗ Error] Record is too big                                                      =====================================================================                                                        |

The jekyll-algolia plugin detected that one of your records exceeds the 10.00 Kb 
record size limit.                                                               

title:    Overview of adapters                                                   
url:      /adapters-overview.html                                                
size:     9.98 Kb                                                                

Most probable keys causing the issue:                                            
   html (6.84 Kb), content (2.29 Kb), keywords (0.06 Kb)                         

Complete log of the record has been extracted to:                                
   /home/gc/repos/precice/website/jekyll-algolia-record-too-big.log              

This issue can be caused by malformed HTML preventing the parser to correctly    
grab the content of the nodes. Double check that the page actually renders       
correctly with a regular `jekyll build`.                                         

You can also exclude the page generating this error from the indexing by editing 
the `files_to_exclude` key of your config.                                       

If you think this is an error and your current Algolia plan should allow you to  
push records bigger than 10.00 Kb, you can change the `max_record_size` config   
option to increase the limit. Paid plans have a limit set to 20Kb, while free    
Community plans have it set to 10Kb.                                             

The following documentation might help you:                                      
   - https://community.algolia.com/jekyll-algolia/options.html#files-to-exclude  
   - https://community.algolia.com/jekyll-algolia/options.html#nodes-to-index    
   - https://community.algolia.com/jekyll-algolia/options.html#max-record-size   

Log: jekyll-algolia-record-too-big.log (9.98 Kb, on the 10 Kb limit)

Trying to reduce the size of the (borderline) adapters-overview.html, I get the same issue with another file: jekyll-algolia-record-too-big.log (17.59 Kb, over the 10 Kb limit, corresponding to this file), which probably is indeed a bit too complex.

I assume the same would apply for several more files.

Both pages seem to have no validation errors:

Since I did not face similar issues when I recently last time updated the index (around March), I am wondering:

Algolia documentation

chlorenz commented 5 months ago

Hi @MakisH, I've started to investigate this issue and so far I can say this:

Let's take the second document, dev-publications.md, as an example. A search record is an HTML block of the rendered document dev-docs-publication-strategy.html. In this particular case the record is the <code></code> block after point '4. Download components' (which HTML elements correspond to records can be configured here).

Because we have syntax highlighting enabled, the rendered code consists of lots of <span></span> elements which blows up the size of the record (as shown by the log files). This is why we hit the 10kB limit.

This change was introduced in d926e2c last week.

The easiest fix is to break up the code block? I don't see how code blocks could be broken up into smaller pieces.

(I've also found this page in the Algolia docs suggesting that some (newer) plans allow bigger records >10kB.)

MakisH commented 5 months ago

Thanks for investigating!

If we ignore this file, do we get several more instances? If this is the only instance, then the easiest would be to move this code to a file in some repository, or just disable the syntax highlighting.

What would be more interesting to me is if anything bigger changed recently, which now affects several files.

chlorenz commented 5 months ago

In fact it turns out this is exactly the answer to #300, and has happened before, see https://github.com/precice/precice.github.io/commit/74e377cece4a221e00b5c56b1db3942ec70a6272 😆

In this specific case the code can be split into two blocks like in https://github.com/precice/precice.github.io/commit/74e377cece4a221e00b5c56b1db3942ec70a6272. I agree that, say, a very involved shell script could be hosted as an external file.

uekerman commented 5 months ago

The open-source plan could be a solution, see also #237

MakisH commented 4 months ago

Today, I requested getting access to the "Algolia for Open Source" plan via the on-platform "contact sales" feature. Waiting for an answer.

Edit: I got an answer three days later, forwarding to a form, which also needed some usage details. On August 29, I submitted the form, requesting 20k records (we currently have 8.36k).

fsimonis commented 4 weeks ago

Update: we got the open-source plan