Open carcruz opened 8 months ago
Hey guys, I haven't done any concrete research but here's what I can gather so far.
Number one reason by far is that the pages are not yet crawled.
In order for google to index our pages, it first crawls (requests) them, and then based on the content that was found, it may index the page. Crawling is done by the googlebot and costs something so there is a budget. Having a large number of pages submitted, heavy/slow pages, pages deemed to be of lower qualitity, duplicate pages, or pages that we don’t want to even index spends the budget unnecessarily and pushes the pages that we do want indexed further down the queue. This may be why many of our pages have not been crawled.
@carcruz @prashantuniyal02 and @jdhayhurst , are we exploring the JSON-LD approach for embedded metadata in Open Targets Platform like in other life sciences resources, e.g. identifiers.org?
@mbdebian and @carcruz to review the relevant in a couple of month.
BE - sitemap.xml is generated by command line tool (BE)
Process improvement for every release (Prashant)