allow file type assignment in configuration.xml (Bugzilla #6165)

vladak commented 11 years ago

status NEW severity enhancement in component analyzer for --- Reported in version unspecified on platform Other Assigned to: Trond Norbye

On 2009-01-20 06:29:23 +0000, Vladimir Kotal wrote:

Some files can be detected as a wrong/unwanted type (see bug # 6163). It would be nice to have a possibility to configure these via configuration.xml.

We can do it via -A command line switch but only by specifying suffixes (see bug # 6164).

Something like this would be nice:
README_ _.txt usr/src/foo/_/textfiles/_

gmgj commented 10 years ago

1) I have to say thanks. opengrok is a really great idea. I tried opengrok to index xml and xml schemas (xsd) files. xsd files are valid xml. If you look at an OASIS offering, you see really extensive xsd file with lots of namespaces. trying to navigate the inter relationships is time consuming and tedious. I am exploring ways to index the xsd files relationships, inheritance and substitution groups are accounted for. Another ways to think of it is how does an indexer handle another level of indirection.

gmgj commented 10 years ago

I am not a java programmer. I have been looking for who to use -A to add an extension I see this

# OPTIONAL: Allow Leading Wildcard Searches
#           (default: on)
LEADING_WILDCARD="-a on"
if [ -n "${OPENGROK_WPREFIX}" ]
then
    LEADING_WILDCARD=""
fi

But I am afraid I do not know where or how to use it On windows, I build the index with this "C:\Program Files (x86)\Java\jre7\bin\java.exe" -jar opengrok.jar -W C:\opengrok\configuration.xml -c C:\ctags58\ctags.exe -P -S -v -s C:\ma\Personal -d C:\opengrok\grokdata

What do I need to do to tell the indexer that xsd are of file type xml?

tarzanek commented 10 years ago

I added an example in above push, @gmgj can you please test it if it will work for you?

gmgj commented 10 years ago

1) thanks 2) I can build systems in a lot of languages; unfortunately, java is not one of them.
I might be able to try this weekend, however, I build nothing in java, I have written hello world and hello a little bit more, but basically I know nothing about java

3) I was going to see if you supported using the ctags.conf (.ctags nix ctags.conf windows)

--langdef=xml --langmap=xml:.xml.html.xhtml.xsd

4) I would take a jar file from you. however, the target is Windows 7

5) add support for xsd schema files (valid xml) What I am hoping for is to find a way to extend the xml parser to

add support for namespaces

example

xmlns:j="http://niem.gov/niem/domains/jxdm/4.0"

    <j:CaseAugmentation>
        <j:CaseCourt>
            <nc:OrganizationIdentification>
                <nc:IdentificationID>MASJC</nc:IdentificationID>
            </nc:OrganizationIdentification>
            <j:CourtName>Supreme Judicial Court for the Commonwealth of Massachusetts</j:CourtName>
        </j:CaseCourt>

add support for inheritance and substitution groups example

add support other for tag attributes This <tag> is cool & pretty. ``` This is actually just data.. ``` ]]> Sample schema file AppellateCase true Party added to the appeal that was not a party in the original case. For instance, the attorney in the original case may appeal sanctions against the attorney by the court. Party to the original case that is not party to the appeal. Information required to initiate a new case in an appellate court. Additional information specific to civil appellate cases. Additional information specific to court rule appellate cases. Information required to initiate a new case in an appellate court. Party added to the appeal that was not a party in the original case. For instance, the attorney in the original case may appeal sanctions against the attorney by the court. The basis for the jurisdiction of the appellate court in the case. true A party being added or removed an appeal.. The reason a party is being added to the appeal. true The reason a party is being removed from the appeal. true Party to the original case that is not party to the appeal. A request for diversion to a settlement program in the appellate court. true Additional information specific to civil appellate cases. Additional information specific to court rule appellate cases. Indicator that the appellant is currently in custody. Indicator that filing fees were waived or deferred in the case in the lower court. An organized set or book of rules of the court that include the rule(s) in question. true A rule number (including rule subsection) in question. Each rule number must refer to a specific rule within the rule collection. ``` /xsd:schema

tarzanek commented 10 years ago

2) - we tried to make it easy and document most of stuff, try downloading netbeans and follow our wikis about development, you can do it!

3) - supported out of box, see OpenGrok script, we have an option for adding ctags.conf

4) jars are portable - for the stuff I asked you to test you just need to edit the OpenGrok script or if you're on WIN7 add after -jar opengrok.jar -A cs:org.opensolaris.opengrok.analysis.PlainAnalayzer (change extension cs (to xsd?) and PlainAnalyzer (to XMLAnalyzer?) to what you deem right)

5) add support for xsd schema files (valid xml) this would be VERY NICE if we could parse xml better than we do now, syntax is no prob (you can adjust existing analyzer and its jflex file), semantic part is the hard thing - ctags does that for us, so if ctags doesn't recognize the file(language), no added value is indexed (no notion of methods or definitions (or namespaces) ) so THIS will be the tricky part and it seems you would need to find some tool (or some xml/xsd library) or write some code that does semantical analysis to achieve what you want (alt. try 3) but I don't know how powerful can ctags be when you define your own language)

hope I explained, if not ask L

gmgj commented 10 years ago

will try this weekend see

https://www.niem.gov/ https://www.oasis-open.org/

for folks who are using xml with namespaces, subsituion groups, inheritance etc. they are standards groups

gmgj commented 10 years ago

1) I gave it a shot.
2) I downloaded netbeans, On this page https://github.com/OpenGrok/OpenGrok/wiki/How-to-build-OpenGrok-from-source Prepare the source for compilation Copy JFlex.jar into the lib directory in the OpenGrok source. (If you are using NetBeans, you could alternatively add JFlex.jar to Ant's classpath at Tools->Options->Miscellaneous->Ant. If you are running Ant from the command line, it should also work if you put JFlex.jar into your ~/.ant/lib directory.) Optionally you need junit*.jar there too. 3) on the latest netbeans for windows, I believe the ANT is attools->options->java-Ant 4) anway. I git clone https://github.com/OpenGrok/OpenGrok 5) open up opegrok in netbeasn and it wanted

jrcs.jarlucene-suggest.jarlucene-analyzers-common.jarlucene-core.jarlucene-queryparsers.jar =========I am not confident that after I add https://code.google.com/p/openhrokfor jrcs.jar (not listed as a dependency) that I will get any farther.

(\ /) (O.o) (> <) http://GaryJohnsonInfo.info I am just a poor boy though my story's seldom toldI have squandered my resistance for a pocketfull of mumblesSuch are promises. All lies in jest'till a man hears what he wants to and disregards the rest -Paul SimonDate: Thu, 9 Oct 2014 08:03:08 -0700 From: notifications@github.com To: OpenGrok@noreply.github.com CC: gary_johnson_53@hotmail.com Subject: Re: [OpenGrok] allow file type assignment in configuration.xml (Bugzilla #6165) (#605)

2) - we tried to make it easy and document most of stuff, try downloading netbeans and follow our wikis about development, you can do it!

3) - supported out of box, see OpenGrok script, we have an option for adding ctags.conf

4) jars are portable - for the stuff I asked you to test you just need to edit the OpenGrok script

or if you're on WIN7 add after -jar opengrok.jar -A cs:org.opensolaris.opengrok.analysis.PlainAnalayzer

(change extension cs (to xsd?) and PlainAnalyzer (to XMLAnalyzer?) to what you deem right)

5) add support for xsd schema files (valid xml)

this would be VERY NICE if we could parse xml better than we do now, syntax is no prob (you can adjust existing analyzer and its jflex file), semantic part is the hard thing - ctags does that for us, so if ctags doesn't recognize the file(language), no added value is indexed (no notion of methods or definitions (or namespaces) )

so THIS will be the tricky part and it seems you would need to find some tool (or some xml/xsd library) or write some code that does semantical analysis to achieve what you want (alt. try 3) but I don't know how powerful can ctags be when you define your own language)

hope I explained, if not ask

L

— Reply to this email directly or view it on GitHub. =

tarzanek commented 10 years ago

AH :) ok, I will need to improve the wiki just open the project in netbeans, make it build, it will download ALL needed I will improve the wiki and build docs, it's not longer needed to do anything, just open the project, ignore any error messages and do CLEAN & BUILD @gmgj sorry for the confusion, so just clone, open in nb, ignore any silly errors and build, for running check OpenGrok script, it should be usable both for production and development runs (it can recognize the build from a cloned WS)

gmgj commented 10 years ago

1) Thanks. 2) Its a LOT of work to do something like opengrok. 3) not the least of which, is how do you get people to use your project, even if its great. I suggested a meetup on source browsers here:http://www.meetup.com/bostonphp/ 4) For the non java guy, who has just installed netbeans, its a little confusing. I pushed a few buttons and their were 4 types of projects. So when you clean up the wiki, I would say, a) for the no java guy, netbeans makes it easy and to build, go to file, open, navigate to where you put the src and from the run menu, hit clean and build 5) when I installed jFlex I was a little concerned, it looked like it had not been changed in 8 years or so. My reaction, this can't be right 6) the next thing I am going to do is try and view xml files in netbeans. 7) what file do I do this in?

or if you're on WIN7 add after -jar opengrok.jar -A cs:org.opensolaris.opengrok.analysis.PlainAnalayzer

(change extension cs (to xsd?) and PlainAnalyzer (to XMLAnalyzer?) to what you deem right)

a)This my bat file to create the configuration.xml "C:\Program Files (x86)\Java\jre7\bin\java.exe" -jar opengrok.jar -W C:\opengrok\configuration.xml -c C:\ctags58\ctags.exe -P -S -v -s C:\ma\Personal -d C:\opengrok\grokdata pause b)this is a section of configuration.xml

output of clean and build ant -f C:\opengroksrc -Dnb.internal.action.name=run runinit:Deleting: C:\opengroksrc\build\built-jar.propertiesdeps-jar:Updating property file: C:\opengroksrc\build\built-jar.propertiesjrcs:download-jflex:jflex:download-lucene:compile:run:Usage: opengrok.jar [options]-? Help-A .ext|prefix.:analyzer Files with the named prefix/extension should be analyzed with the specified class-a on/off Allow or disallow leading wildcards in a search-B url Base URL of the user Information provider. Default: "http://www.myserver.org/viewProfile.jspa?username="-C Print per project percentage progress information(I/O extensive, since one read through dir structure is made before indexing, needs -v, otherwise it just goes to the log)-c /path/to/ctags Path to Exuberant Ctags from http://ctags.sf.net by default takes the Exuberant Ctags in PATH.-D Store history cache in a database (needs the JDBC driver in the classpath, typically derbyclient.jar or derby.jar)-d /path/to/data/root The directory where OpenGrok stores the generated data-e Economical - consumes less disk space. It does not generate hyper text cross reference files offline, but will do so on demand - which could be sightly slow.-G Assign commit tags to all entries in history for all repositories.-H Generate history cache for all repositories-h /path/to/repository just generate history cache for the specified repos (absolute path from source root)-I pattern Only files matching this pattern will be examined (supports wildcards, example: -I .java -I .c)-i pattern Ignore the named files or directories (supports wildcards, example: -i .so -i *.dll)-j class Name of the JDBC driver class used by the history cache. Can use one of the shorthands "client" (org.apache.derby.jdbc.ClientDriver) or "embedded" (org.apache.derby.jdbc.EmbeddedDriver). Default: "client"-k /path/to/repository Kill the history cache for the given repository and exit. Use '' to delete the cache for all repositories.-K List all repository pathes and exit.-L path Path to the subdirectory in the web-application containing the requested stylesheet. The following factory-defaults exist: "default", "offwhite" and "polished"-l on/off Turn on/off locking of the Lucene database during index generation-m number Amount of memory that may be used for buffering added documents and deletions before they are flushed to the Directory(default 16.0MB). Please increase JVM heap accordingly, too.-N /path/to/symlink Allow this symlink to be followed. Option may be repeated. By default only symlinks directly under source root directory are allowed.-n Do not generate indexes, but process all other command line options-O on/off Turn on/off the optimization of the index database as part of the indexing step-o path File with extra command line options for ctags-P Generate a project for each of the top-level directories in source root-p /path/to/default/project This is the path to the project that should be selected by default in the web application(when no other project set either in cookie or in parameter). You should strip off the source root.-Q on/off Turn on/off quick context scan. By default only the first 1024k of a file is scanned, and a '[..all..]' link is inserted if the file is bigger. Activating this may slow the server down (Note: this is setting only affects the web application)-q Run as quietly as possible-R /path/to/configuration Read configuration from the specified file-r on/off Turn on/off support for remote SCM systems-S Search for "external" source repositories and add them-s /path/to/source/root The root directory of the source tree-T number The number of threads to use for index generation. By default the number of threads will be set to the number of available CPUs-t number Default tabsize to use (number of spaces per tab character)-U host:port Send the current configuration to the specified address (This is most likely the web-app configured with ConfigAddress)-u url URL to the database that contains the history cache. Default: If -j specifies "embedded", "jdbc:derby:$DATA_ROOT/cachedb;create=true"; otherwise, "jdbc:derby://localhost/cachedb;create=true"-V Print version and quit-v Print progress information as we go along-W /path/to/configuration Write the current configuration to the specified file (so that the web application can use the same configuration-w webapp-context Context of webapp. Default is /source. If you specify a different name, make sure to rename source.war to that name. Also FULL reindex is needed if this is changed.-X url:suffix URL Suffix for the user Information provider. Default: ""-z number depth of scanning for repositories in directory structure relative to source root. Default is 3 . Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; support was removed in 8.0Java Result: 1BUILD SUCCESSFUL (total time: 0 seconds) (\ /) (O.o) (> <) http://GaryJohnsonInfo.info I am just a poor boy though my story's seldom toldI have squandered my resistance for a pocketfull of mumblesSuch are promises. All lies in jest'till a man hears what he wants to and disregards the rest -Paul SimonDate: Mon, 13 Oct 2014 03:54:22 -0700 From: notifications@github.com To: OpenGrok@noreply.github.com CC: gary_johnson_53@hotmail.com Subject: Re: [OpenGrok] allow file type assignment in configuration.xml (Bugzilla #6165) (#605)

AH :)

ok, I will need to improve the wiki

just open the project in netbeans, make it build, it will download ALL needed

I will improve the wiki and build docs, it's not longer needed to do anything, just open the project, ignore any error messages and do

CLEAN & BUILD

@gmgj sorry for the confusion, so just clone, open in nb, ignore any silly errors and build, for running check OpenGrok script, it should be usable both for production and development runs (it can recognize the build from a cloned WS)

— Reply to this email directly or view it on GitHub. =

gmgj commented 10 years ago

Thanks again. I did want to clarify 2 things about the above

2) Its a LOT of work to do something like opengrok. I am referring to the work that you and the rest of the contributors are doing

7) what file do I do this in?

or if you're on WIN7 add after -jar opengrok.jar -A cs:org.opensolaris.opengrok.analysis.PlainAnalayzer

(change extension cs (to xsd?) and PlainAnalyzer (to XMLAnalyzer?) to what you deem right)

gmgj commented 10 years ago

This plugin for netbeans provides some of the schema etc parsing that I was looking for https://blogs.oracle.com/geertjan/entry/xml_schema_editor_in_netbeans

tarzanek commented 10 years ago

7) - you pass that on to indexer and then xsd extension will be recognized by appropriate parser and output will look better than with plainAnalyzer

to do what the xml schema editor does in netbeans you would need to completely rewrite XREF output of an analyzer (for XSDs - you can clone XML one) this rewrite however won't be easy (but can be done if you understand dom/sax parsers or use some library to do it for you)

hth L

oracle / opengrok

allow file type assignment in configuration.xml (Bugzilla #6165) #605

example