Integrating The Old Tools

tajmone / PBCodeArcProto

PB CodeArchiv Rebirth Indexer Prototype

4 stars 0 forks source link

Integrating The Old Tools #10

Open tajmone opened 6 years ago

tajmone commented 6 years ago

UPDATE — Functionality from the Old Tools is currently being integrated into the mod_Resources.pbi module, tested via the CheckResources.pb app and documented in MODULARIZATION_NOTES.md.

I've taken from your dev branch the two original check-and-clean tools that you mentioned and placed them in _tempwork folder:

So I can start to study and test them closer in order to plan their integration into the final app. I've created a markdown doc for each tool, where I'll annotate their current operations and how they could be adapted to the new needs:

I'll proceed this way:

[x] create a list of all the operations each tool carries out
[ ] try to isolate which operations are common to more than one tool
- [x] plan the modularization of such operations
[ ] work out which operations needs to be updated to work with the new system

Single-App Integration Considerations

I'm still trying to imagine their real case scenario usage, and thinking if their aggreagation into a single app might pose an overhead for some maintainance operations. I'm saying this because I can imagine that at certain time a developer might wish to work on some single aspects of the project (eg, cosmetic changes to the HTML layout), and having the app forcefully carry out all tests (especially checks on every single resource main file) could add a huge overhead in terms of time.

[ ] Implement flags to allow circumventing checks for testing purposes (See below).

A Contributors Test Tool?

Also, I was thinking that even though the html pages creator app is intended for project maintainers only, it would be a good idea to provide also a tool destined for end users wishing to contribute a resource: they would use this tool to check the integrity of their resource before sumbitting/updating it to the project. In this case, the tool would have to target a single resource, and only check that it satisfies all requirements.

A similar tool should prevent pull requests that need to be fixed, as the end user would take care of that before making the PR.

[ ] Create a tool for checking the integirty of isolated resources.

Travis Continuos Integration

Obviously, the ideal solution would be to use [Travis CI] to carry out such tests automatically at pull request time. The only downside is that this would only work on GitHub, and the code for carrying out the check would have to be written in one of Travis supported languages:

https://docs.travis-ci.com/user/languages/

... but it might be well worth having a look at it, if not now maybe later on. Automation of some Git realted steps would be a great addition to the project workflow.

Besides, some scripted tools can also be added to the Git project internals too, allowing Git to run the scripts during or after certain operations.

I haven't actually used neither of these, but I did peek at them out of curiosity and know that they can be used for the purpose.

[Travis CI]: https://travis-ci.org[]: https://github.com/tajmone/PBCodeArcProto/blob/master/_assets/CheckResources.pb: https://github.com/tajmone/PBCodeArcProto/blob/master/_assets/CheckResources.pb

tajmone commented 6 years ago

ERRATA CORRIGE — I've found some comments in main code! didn't see them at first.

Grrrr 😡 ... the tools' source files don't contain ~~any~~ many comments resuming what does what.

I'll have to read their code line by line to workout what the tools actually do.

I know I'm often accused of overdoing commenting my source files; but why no so little comments ~~at all~~?

tajmone commented 6 years ago

The CodesChecker Tool

Ok, I've gone through the CodesChecker tool and documented the list of check operations that I've so far understood it carries out:

https://github.com/tajmone/PBCodeArcProto/blob/master/_tempwork/CodesChecker.md#check-operations

I've also listed there the things I didn't understand.

Initial Integration Considerations

Many of the checks on the headers-comments are already carried out by the current html builder, or could be easily added to it in the procedures that handle the key-value extraction:

[x] check that a code header-comments block is present
[ ] check that the following keys are present:
- [ ] Author
- [ ] Date
- [ ] Description
- [ ] English-Forum
- [ ] French-Forum
- [ ] German-Forum
- [ ] OS
these checks are easy to add to the current code.
[ ] check well-formedness of values for the following keys:
- [ ] Date » ^\d{4}-\d{2}-\d{2}$ (= YEAR-MM-DD)
these checks are also easy to add to the current code.

Other check tasks, should be added at STEP 2 (project integrity check):

[ ] Check code syntax via the compiler syntax checker (--check --thread)
[ ] If resource is an include file ("*.pbi"):
- Check for presence of CompilerIf #PB_Compiler_IsMainFile block

... as for these last two checks, I need to understand them better:

How is code for other OSs going to be checked?
Cross-platform code can only be checked for the current OS!
How strict a check is the CompilerIf #PB_Compiler_IsMainFile block presence?
- Does it mean that all include files should contain some test code to run them on their own?
- The lack of this block should prevent the resource inclusion in the project?

Additional Checks Suggested

I also suggest the following checks:

PB Sources Encoding

[ ] all files should be UTF-8 with BOM

My undestanding is that this is default setting for the PB IDE. The alternative is Ascii, which is rather obsolete — and Ascii is UTF-8 compatibile anyway.

As for the BOM, it's a bad choice to use BOM for UTF-8: it's not reccomended, but it's not forbidden either. Many Microsoft code tools use/requires the BOM in UTF-8 files.

Also, PB IDE will always add the BOM to a source file when it saves it, even if you removed the BOM from it manually (I've tested this, and I'm 100% positive about it).

So, I guess will have to leave the BOM there to accomodate the default settings of most PB users.

You mentioned that the archive should only contain code that runs/compiles with the current PB version, so there is no reason to allow Ascii encoded sources where UTF-8 would be a better choice (ie, this should only occur for old code written for versions of PB that still supported Ascii binaries creation).

tajmone commented 6 years ago

The Codes Cleaner Tool

I've looked into the CodesCleaner tool too, and documented it:

https://github.com/tajmone/PBCodeArcProto/blob/master/_tempwork/CodesCleaner.md

As far as I can tell, it only deals with removing any compiler settings that might be embedded in the source file.

Integration Considerations

I'm not sure if this functionality should be handled by the HTML pages creator:

the app could indeed check for the presence of any settings at the end of the source file (this would add quite some overhead though, as it would mean parsing every single source file up to its end),
in case it found a file with embedded settings, it could clean it up.

One thing is for sure: the removal of embedded settings should be added to the app for contributors, by which they should check/fix their code before submitting a pull request.

Overkill?

Wether the HTML creator should also go in its preliminary project integrity checks as far as parsing every single source file is something we should think about carefully, especially if we're going to make it a single app. As the project grows, this would mean that every HTML rebuild operation would have to parse every source file, line by line, till the end. I can immagine how this would become an overkill when working on a single aspect (eg, tweaking the pandoc HTML template) and need to rebuild the whole project many times.

Caching?

Of course, if we implemnt a cache system of some sort, which stores the SHA1 of the already checked files, so that if the SHA1 hasn't changed the app doesn't carry out the checks again, then the overkill will not be a problem. I was already thinking of building a cache system, but I was hoping to postpone it to a later time.

I can't say for sure how long it would take to parse all the source files in the project integrity check stage, but from my experience with the test files I guess it's going to take some time. I know that it's never going to take ages (nothing like the first ray-tracing tools for the Amiga, which took 16 hours to produce a single image), even with hundreds of files we might still fall in the range of a few minutes (and the disk cache should fasten successive accesses). Also, being a statically build website, speed is not a real concern except for the maintainers at update time.

or Divorce?

I'm just wondering if having a single app handle it all is the best solution. Sure, with caching it would be ideal (and PB has all the needed libraries to implement a cache).

SicroAtGit commented 6 years ago

Single-App Integration Considerations

I'm still trying to imagine their real case scenario usage, and thinking if their aggreagation into a single app might pose an overhead for some maintainance operations. I'm saying this because I can imagine that at certain time a developer might wish to work on some single aspects of the project (eg, cosmetic changes to the HTML layout), and having the app forcefully carry out all tests (especially checks on every single resource main file) could add a huge overhead in terms of time.

Yes, it would be very practical if individual tasks could be skipped. This can be easily realized with flags:

EnumerationBinary Run_Flags
  #Run_Flag_CodeCleaner
  #Run_Flag_CodeChecker
  #Run_Flag_ProjctTreeBuilder
  #Run_Flag_HTMLPagesCreator
EndEnumeration

; ### Define here the tasks which should be run
#Run_Flags | #Run_Flag_CodeCleaner
;#Run_Flags | #Run_Flag_CodeChecker
#Run_Flags | #Run_Flag_ProjctTreeBuilder
;#Run_Flags | #Run_Flag_HTMLPagesCreator

CompilerIf #Run_Flags & #Run_Flag_CodeCleaner
  ; Code of the CodeCleaner
CompilerEndIf

CompilerIf #Run_Flags & #Run_Flag_CodeChecker
  ; Code of the CodeChecker
CompilerEndIf

; and so on

If not all tools were in one single code, that would also be possible:

IncludeFile "CodeCleaner.pbi"
;IncludeFile "CodeChecker.pbi"
IncludeFile "ProjectTreeBuilder.pbi"
;IncludeFile "HTMLPagesCreator"

If there are more than one tool without a main tool that executes all separate tools, a maintainer can forget to execute a tool or execute it in the wrong order.

A Contributors Test Tool?

Yes, I think such a tool is important and should be offered.

Travis Continuos Integration

Surely this is a very good and very helpful thing. We can think about that later. At first glance, it looks very complex.

Besides, some scripted tools can also be added to the Git project internals too, allowing Git to run the scripts during or after certain operations.

You mean git hooks?

SicroAtGit commented 6 years ago

How is code for other OSs going to be checked?

Cross-platform code can only be checked for the current OS!

The CodeChecker must run on all three operating systems.

If the restriction were removed, the PB compiler would report errors under Linux and Mac, e.g. for constants and functions of the Windows API which are predefined in the Windows version of Purebasic.

How strict a check is the CompilerIf #PB_Compiler_IsMainFile block presence?

Does it mean that all include files should contain some test code to run them on their own?

The lack of this block should prevent the resource inclusion in the project?

I wanted to keep programming this check very simple for the beginning.

It should check whether pbi files have executing code outside of procedures.

If pbi files are included in other codes, no code should be executed automatically. This is to prevent that no example codes are executed if the .pbi file is included in another code.

The pbi files should be like libraries (.dll (Windows), .so (Linux) and .dylib (Mac)).

I also suggest the following checks:

all files should be UTF-8 with BOM

Yes, I agree with you.

SicroAtGit commented 6 years ago

I'll answer the other questions another time.

tajmone commented 6 years ago

I'll need more time to elaborate on all your answers, but as for skipping checks via flags:

This can be easily realized with flags

... I'd rather have a pop up window when the app is launched, with checboxes that can be used to skip certain checks or tweak settings (like Debug Level, etc.).

The reason for this is because that pb source file is being version controlled, so any changes to its code will end up interfering with Git, showing as a changed tracked file — and might even end up accidently commited as a change. So, best don't fiddle with the code IMO.

SicroAtGit commented 6 years ago

... I'd rather have a pop up window when the app is launched, with checboxes that can be used to skip certain checks or tweak settings (like Debug Level, etc.).

Yes, you're right, a GUI with checkboxes is much better.

But we shouldn't use the PB-DialogLib for the GUI, because under Linux the packages webkitgtk (GTK+ 3) and webkitgtk2 (GTK+ 2) are used for this and they are usually not easy to install.

On Manjaro Linux there are no precompiled versions of these packages in the package manager, so it must be obtained from the AUR repository and compiled for hours.

The reason for this is because that pb source file is being version controlled, so any changes to its code will end up interfering with Git, showing as a changed tracked file — and might even end up accidently commited as a change. So, best don't fiddle with the code IMO.

As you can see, versioned control is not yet fully integrated into my head --- LOL Sure, temporary settings in the code is not a good idea.

SicroAtGit commented 6 years ago

check that the following keys are present:

Author

Date

During the new building of the archive I decided to remove these fields, because the authors and the last update time of the codes can now be determined via the copyright notes.

SicroAtGit commented 6 years ago

Caching?

Yes, I also think caching would be the best solution.

tajmone commented 6 years ago

But we shouldn't use the PB-DialogLib for the GUI...

Good job that you told me, because I was going to use it. That's really a pitty, the Dialog lib is quite cool and allows fine control over the GUI.

Maybe this is the reason why I couldn't get PB to work well with GUI apps in Lubuntu virtual machines.

During the new building of the archive I decided to remove these fields, because the authors and the last update time of the codes can now be determined via the copyright notes.

They might still be useful (in the future) if we need to calculate some statistic, or a page wiwth the list of all contributors and how many resources there are from each author, or even allow to click on an author's name and show a results page with links to all his/her resources.

Also, the resume cards are not going to show the full license, just mention the type of license.

The date field refers to the original creation date or the last update?

tajmone commented 6 years ago

Even though I haven't update much the project in the last two weeks, I've been doing some local tests trying to figure out the best approach to modularize the current Page Builder so that its functionality can also be used by the upcoming Codes Checker and Cleaner tools.

The main problem I've been struggling with is how to preserve the current logging and error handling when splitting the code. Currently, being a single source file, the diagnonistic and debug information is being simply printed to the Debug Window using some common variables in main code, sometimes using custom macros, and often via conditional evalution of the debug level settings. Splitting the code will require to also modularize all the variables, macros, and other functionality that is shared by the various processing steps so that they can be reused by any tool.

I need to rethink the whole approach to info logging and errors handling; probably I'll have to first move all logging and errors handling into a common module.

Also, the introduction of a GUI to handle settings and control debug level and which operations to carry out, adds another layer of complexity to logging and error handling — ie, I'm considering that the logging module should allow to display the various debug info and errors also in a GUI text control, so I'd like to make it flexible in terms of where the output info can be redirected to (options being: the GUI, the Debug Window, a report file, etc.).

The reason I haven't yet started to work on this is because I haven't yet come up with a fully satisfactory approach to these problems. The point is that it would make sense to split the app into modules right now, before going ahead with the pending tasks (which would have to be rewritten), and it should be done in an elegant way that would allow other tools to easily interface with the core functionality, and with a friendly and documented API that hides complexity of details to other programmers writing more tools for the project.

I just need to gather my thoughts clearly on this, so that I can head straight for a good solution, and not end up rewriting the whole things many times over (which means that I should also consider some possible future uses and leave some doors and options open to them).