whoot / Typo3Scan

Enumerate Typo3 version and extensions
GNU General Public License v2.0
169 stars 32 forks source link

Extension version search #28

Closed snuxs closed 2 years ago

snuxs commented 2 years ago

Finding wrong extension version, even tho file with right version exists\ I am trying to scan for TYPO3 and extension versions. The issue is, that for a few extensions the scanner seems to be going in the wrong direction.

The scanner uses the path
[url]/Documentation/ChangeLog to find a version of the extension. From there it uses the "Last Modified" date as version. This is only sometimes (for some extensions) the case.

Furthermore, there are files for the extensions with the correct version in [url]/Documentation/Settings.cfg from my understanding they also get searched but the scanner dismisses the result of them. (Maybe I am wrong here, but in the "extensions.py" file it looks like it)

If I remove the line where the scanner opens [url]/Documentation/ChangeLog it still does not use the [url]/Documentation/Settings.cfg path. In this case the scanner uses [url]/CHANGELOG.md. In this path there is no version information in my case but the scanner uses the first number that it can find. ( an issue with the Changelog.md, not the scanner)

My question is, if there is any way to change the priority of the scanned paths. I do not understand why the [url]/Documentation/ChangeLog path is used and not the [url]/Documentation/Settings.cfg

Extensions I have had this problem with\

Thank you for your help and your tool!

whoot commented 2 years ago

Hi, thank you for pointing this out. Short answer is: version identification for extensions sucks and yes, Typo3Scan needs to be adapted to add priority to the paths. Maybe I can figure something out.

Long answer is: version identification for extensions sucks, because there are many issues with the files. The main issues I found are:

  1. You can't use php files for version detection (obviously). So you need to check for changelogs, setting files and so on. However, Typo3 restricts access to most of them by default.
  2. Extension developers tend to not update version numbers or descriptions on each update. Even if you could identify a version string, it does not mean that it is actually the version used. Just download a bunch of extensions a check yourself.
  3. Version information is not consistent. Developers use what they want. Some use a date, some actual version numbers, some may even just add a short text and some don't track versions at all.

My solution to this was: download all extensions and get the most common files which could include version information (these are the ones in extensions.py). If such a file exists, report it. If not specified in extensions.py a generic regex ('([0-9]+.[0-9]+.[0-9x][0-9x]?)') is used for searching for version info. This may be the reason you see the "last modified date".

As soon as a version string is found, the scanner aborts requesting other files. This is probably the reason why the settings.cfg is not requested in your case.

whoot commented 2 years ago

Just as an example to illustrate my point. metaseo_tqseo_migration

Version is 1.0.0/stable, but this version is nowhere in the extension files. Only version info in Settings.yaml is

version: 6.0 release: 6.0.0

which is basically the supported Typo3 version. So yeah... extension versions are not reliable and its a mess.

snuxs commented 2 years ago

Hi, thank you for your fast and complete answer!

The "Last Modified" Date does not have to be in a file, it can also be the date when a file in the searched directory has been modified? That was also something that confused me.

I thought it works something like that, but I missed, that the scanner aborts requesting other files.

I tried to remove all lines except the line where it searches through settings.cfg and now the scanner does not find a version at all. After that I tried a request and checked the response of settings.cfg and there was a match for the RegEx (?:release:)\s?([0-9]+\.[0-9]+\.?[0-9]?[0-9]?)

I suppose that is an issue on my (web)site, because the same request works for other extensions.

But thank you for your time and help, with that knowledge I will try to force the settings.cfg request for the mask extension.

whoot commented 2 years ago

The "Last Modified" Date does not have to be in a file, it can also be the date when a file in the searched directory has been modified? That was also something that confused me.

Nah, this cannot be the case. You just request the file and search the file content for version info. You don't have access to the file system.

After that I tried a request and checked the response of settings.cfg and there was a match for the RegEx (?:release:)\s?([0-9]+.[0-9]+.?[0-9]?[0-9]?)

Interesting. And the version info is not reported by Typo3Scan? Whats done after the match?

snuxs commented 2 years ago

Nah, this cannot be the case. You just request the file and search the file content for version info. You don't have access to the file system.

Alright, good to know.

Interesting. And the version info is not reported by Typo3Scan? Whats done after the match?

Actually I am trying to reproduce at the moment but cannot figure out what I did before. I just tried it in console. Now I am trying to match it again with the RegEx but it does not match. Sorry, dont know what I did there.

Still in my understanding it should match(although im not too good with RegEx) but maybe you see the issue.

Here are the first 4 lines of the response from the Settings.cfg:

[general]
project = Mask
release = 7.0.10
copyright = 2021
whoot commented 2 years ago

Yeah, matching regex is:

(?:release\s=)\s?([0-9]+\.[0-9]+\.?[0-9]?[0-9]?)

And to be able to also catch release: 7.0.10 you should use:

(?:release\s?[=:])\s?([0-9]+\.[0-9]+\.?[0-9]?[0-9]?)

Edit: there are plenty of regex validators out there. E.g. https://extendsclass.com/regex-tester.html#python

snuxs commented 2 years ago

Awesome! Both regex work for me. Thank you!

(?:release\s?[=:])\s?([0-9]+.[0-9]+.?[0-9]?[0-9]?)

If I would use this regex in the scanner it works for "release:" and release=" so I probably would not run into any issues if there should be the normal case I guess?

Edit: there are plenty of regex validators out there. E.g. https://extendsclass.com/regex-tester.html#python

will use that the next time, thank you for the hint!

whoot commented 2 years ago

If I would use this regex in the scanner it works for "release:" and release=" so I probably would not run into any issues if there should be the normal case I guess?

Yes, this will work. Making a push request right now.

whoot commented 2 years ago

Could you please use the dev branch and see if it works as intended?

snuxs commented 2 years ago

On it. First test without removing the Changelog lines gave me the date again. I am trying again.

The "Last Modified" Date does not have to be in a file, it can also be the date when a file in the searched directory has been modified? That was also something that confused me.

Nah, this cannot be the case. You just request the file and search the file content for version info. You don't have access to the file system.

Sorry I have to ask again, did not completely get it. If I have a look in "/mask/Documentation/ChangeLog/" then I can see 2 folders and one "file.rst". I dont think that the scanner searches through the folders. But inside the "file.rst" there is no date at all. The version output I get is the last modified date for the file and folders(all were modified on the same date).

Furthermore, if the scanner requests a file and reads the content, how is it possible, that the scanner can get a version information out of "/Documentation/Changelog" if its not looking for a file like "/Documentation/Changelog/file.xyz"?

snuxs commented 2 years ago

Could you please use the dev branch and see if it works as intended?

Worked as intended for my problem with the mask extension!

This will probably also solve the problems for MetaSEO and [clickstorm]SEO Will report that tomorrow after the scans are finished

whoot commented 2 years ago

Sorry I have to ask again, did not completely get it. If I have a look in "/mask/Documentation/ChangeLog/" then I can see 2 folders and one "file.rst". I dont think that the scanner searches through the folders. But inside the "file.rst" there is no date at all. The version output I get is the last modified date for the file and folders(all were modified on the same date).

Furthermore, if the scanner requests a file and reads the content, how is it possible, that the scanner can get a version information out of "/Documentation/Changelog" if its not looking for a file like "/Documentation/Changelog/file.xyz"?

Regexes are used to search for version info in the following files:

/doc/manual.sxw
/composer.json
/doc/manual.pdf
/doc/manual.odt
/Documentation/Settings.yml
/Documentation/Settings.yaml
/Documentation/Settings.cfg
/ChangeLog.txt
/Documentation/ChangeLog
/CHANGELOG.md

Your reported version must be somewhere in one of them. Reading and understanding the source code will also help to answer your questions.

snuxs commented 2 years ago

Thanks for the explanation. I probably overlooked it in the files. But the scanner also outputs a .json where it also documents the path to the version file and that path is "/Documentation/Changelog", so no exact file. I worked around it, was just curious.

The updated regex works really good for me. I am able to find mask, MetaSEO and [clickstorm]SEO versions in most cases now! Thank you for your help!