picocms / Pico

Pico is a stupidly simple, blazing fast, flat file CMS.
http://picocms.org/
MIT License
3.82k stars 616 forks source link

Files in subdirectory not seen #479

Closed omniperspective closed 5 years ago

omniperspective commented 5 years ago

Installation 2.0.5.-beta.1 (but also seen on 2.0.0-beta-3)

++++ Some info ++++ [jrdschwrdbknl@web001 public_html]$ ls -l content/ totaal 96 -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 154 jun 18 2018 404.md -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 510 sep 5 21:45 afkortingen.md drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 sep 7 23:48 algemeen

[jrdschwrdbknl@web001 public_html]$ cd content/algemeen/ [jrdschwrdbknl@web001 algemeen]$ ls -l totaal 88 -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 1040 sep 7 23:48 advocaat-belastingrecht.md -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 941 sep 7 23:48 advocaat-familierecht.md -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 1161 sep 7 23:48 advocaat-financieel-recht.md -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 1189 sep 7 23:48 balie.md


Issue: The file "afkortingen.md" is seen in the root of the directory 'content'. The file "balie.md" in directory in the directory 'content/algemeen' is also seen.

But If I look with seofrog spider, or sitemap scanner like https://www.xml-sitemaps.com/ can't see any of the files under any sub-directory in /content/

What I don't understand that a direct addressing of of file works, but scanning a directory onder content doesn't show up any files. Why I need this that Google webmaster tools randomly rejects these files 'unreadable' if I force a sitemap with all the files in there. Also the plugin PicoRobots won't show all files.

What am'I missing here. Please help, any pointer is helpfull.

PhrozenByte commented 5 years ago

External third-party sitemap scanners rely on recursively navigating through your website in the hope to find all pages. This is a very error-prone process and often doesn't yield the expected results. Thus you should use a plugin instead. For creating a sitemap.xml automatically you can use Pico's official PicoRobots plugin.

Also the plugin PicoRobots won't show all files.

What does this mean in particular? Can you provide a link?

omniperspective commented 5 years ago

I have the PicoRobots plugin working, and will see all files.

But still https://www.xml-sitemaps.com/ or SeoFrog Spider can't see all files.

The site: https://juridisch-woordenboek.nl A legal dictionary for students, build with DataTables and Pico.

It drives me nuts, and don't know how to diagnose this problem, this last is gives the most frustration

Info:

These are the directory's en there directory file permissions.

[jrdschwrdbknl@web001 content]$ pwd /home/jrdschwrdbknl/public_html/content [jrdschwrdbknl@web001 content]$ ls -l totaal 96 -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 232 jan 11 22:10 404.md -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 499 jan 11 22:10 afkortingen.md drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 algemeen drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 arbeidsrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 bestuursrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 bouwrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 erfrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 faillissementsrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 goederenrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 huurrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 incasso -rw-rw-r-- 1 jrdschwrdbknl jrdschwrdbknl 471 jan 11 22:10 index.md drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 mediarecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 ondernemingsrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 onderwerp-intellectueel-eigendom drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 procesrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 sub -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 519 jan 11 22:10 titulatuur.md -rw-r--r-- 1 jrdschwrdbknl jrdschwrdbknl 1749 jan 11 22:11 uitleg.md drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 vastgoedrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 12288 jan 11 22:03 verbintenissenrecht drwxrwxr-x 2 jrdschwrdbknl jrdschwrdbknl 4096 jan 11 22:03 vve [jrdschwrdbknl@web001 content]$

PhrozenByte commented 5 years ago

Since your website works completely fine, sitemap.xml lists all pages and all pages are accessible, the error is in these third-party services failing to work as expected. You should contact the customer service of these services.

#edit: Check your robots.txt, you explicitly disallow crawling your website:

User-agent: *
Disallow: /
Disallow: /config/
Allow: /tmp
omniperspective commented 5 years ago

I have some fixed the robots by adding the the 'robot name' but it doesn't change. Also the option to contact seofrog spider is a good one, BUT if I give seofrog a spin on https://pico-cms.org it works ;-) But I don't know the version on what that is running. It must be something else, but again I don't know how to diagnose this.

PhrozenByte commented 5 years ago

It's likely still the robots.txt, remove the Disallow: / rule. If it still doesn't work, contact the customer service of these 3rd-party services, this is no error in Pico.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two days if no further activity occurs. Thank you for your contributions! :+1: