mowolf / ChatAnalyzer

Java script webapp that analyzes your WhatsApp Chat history locally on your machine.
https://chatanalyzer.moritzwolf.com
Other
208 stars 41 forks source link

enable code bundling #30

Closed Nilegfx closed 6 years ago

Nilegfx commented 6 years ago

in order to start better tests, I think we need to bundle, at least for now analyzer.js, feedback.js and form.js

I can add webpack bundling support if you are planning to do so.

mowolf commented 6 years ago

I don‘t know much about webpack and as far as I can see this will not really be needed? Or what benefits would that bring?

Sent with GitHawk

Nilegfx commented 6 years ago

Right now the project awesome and it DOES THE JOB, but it is tightly coupled, (the frontend, analyzer .. etc) all together.

wouldn't be great if we can breakdown the analyz module into separate npm package that does only the analysis (no browser dependencies or DOM manipulations) then include it in the current repo as a dependency. this will allow you to reuse/test the most important feature in this application (analyzer), maybe building a mobile app, maybe using it on the server-side (nodejs) .. etc

In order to divide the analyzer functionality (tokenizer, time extractor, find users .. etc) we need a way to purify these functionalities as much as possible, so we can unit test each one separately and confidently maintain and add more feature to it.

Would also be easy to write modern and latest javascript syntax (ES6, 7, 8 .. etc) by adding babel loader to webpack, which will ensure the package is always future-proof in terms of javascript language/syntax.

an example (I am using jest for testing):

I created three files:

tokenizer.js

export const tokenizer = function ( content ) {
  let explainedPattern = [

    //might start with [
    "(\\[?)",

    /*
    * matches two occurrences of (1-4 digits followed by single character (a dash '-' or slash '/' or a dot '.')
    * representing day and month,  followed by 2-4 digits representing the year), this means it will match all the following:
    *
    * single day, single month, year at the end
    * - 1.1.18
    * - 1.1.218
    * - 1.1.2018
    *
    * double day, single month, year at the end
    * - 12.1.18
    * - 12.1.218
    * - 12.1.2018
    *
    * double day, double month, year at the end
    * - 12.12.18
    * - 12.12.218
    * - 12.12.2018
    *
    * single day, single month, year at the beginning
    * - 18.1.1
    * - 218.1.1
    * - 2018.1.1
    *
    * double day, single month, year at the beginning
    * - 18.12.1
    * - 218.12.1
    * - 2018.12.1
    *
    * double day, double month, year at the beginning
    * - 18.12.12
    * - 218.12.12
    * - 2018.12.12
    *
    * single day, single month, year at the end
    * - 1-1-18
    * - 1-1-218
    * - 1-1-2018
    *
    * double day, single month, year at the end
    * - 12-1-18
    * - 12-1-218
    * - 12-1-2018
    *
    * double day, double month, year at the end
    * - 12-12-18
    * - 12-12-218
    * - 12-12-2018
    *
    * single day, single month, year at the beginning
    * - 18-1-1
    * - 218-1-1
    * - 2018-1-1
    *
    * double day, single month, year at the beginning
    * - 18-12-1
    * - 218-12-1
    * - 2018-12-1
    *
    * double day, double month, year at the beginning
    * - 18-12-12
    * - 218-12-12
    * - 2018-12-12
    *
    * single day, single month, year at the end
    * - 1/1/18
    * - 1/1/218
    * - 1/1/2018
    *
    * double day, single month, year at the end
    * - 12/1/18
    * - 12/1/218
    * - 12/1/2018
    *
    * double day, double month, year at the end
    * - 12/12/18
    * - 12/12/218
    * - 12/12/2018
    *
    * single day, single month, year at the beginning
    * - 18/1/1
    * - 218/1/1
    * - 2018/1/1
    *
    * double day, single month, year at the beginning
    * - 18/12/1
    * - 218/12/1
    * - 2018/12/1
    *
    * double day, double month, year at the beginning
    * - 18/12/12
    * - 218/12/12
    * - 2018/12/12
    * */
    "((\\d{1,4}(\\-|\\/|\\.){1}){2}\\d{1,4})",

    /*
    * separator between date and time with one occurrence for one of:
    * - a space followed by 1-3 characters then another space
    * - a comma followed by a space
    * - any character followed by a space
    * */

    "((\\s.{1,3}\\s|\\s)|,\\s|\\.\\s){1}",

    // a group for date and time
    "(",

    /*
    * match time
    * match 1-2 digits followed by a colon ':' (hours), followed by two digits (minutes),
    * followed by an optional (colon followed by two digits) which are the seconds
    * */
    "((\\d{1,2}\\:)\\d{2}(:\\d{2})?)",

    /*
    * day/night
    * */
    "(\\s(a|p)m?|\\s(A|P)?M|\\s(a|p)?\\.(\\s)?\\m.)?",

    ")",

    /*
    * it might end with one of:
    * - ']' followed by a space
    * - a space followed by a dash '-' followed by another space
    * - a colon ':'
    * */
    "(\\]\\s|\\s\\-\\s|\\:)"
  ].join( "" );

  let re = new RegExp( explainedPattern, "g" );
  return re.exec( content )
}

tokenizer.test.js

import { usecases } from './usecases';
import { tokenizer } from './tokenizer';

describe( 'tokenizer', () => {

  it( 'should cover all use cases', () => {
    usecases.forEach( ( [ usecase, expectedResult ] ) => {

      if ( expectedResult ) {
        //str.match returns array-like object which has more
        //properties then the regular array (index, input, groups .. )
        //this is why we wrapped it in Array.from
        expect( Array.from( tokenizer( usecase ) ) ).toEqual( expectedResult );
      } else {
        expect( tokenizer( usecase ) ).toBeFalsy();
      }
    } );
  } );
} );

usecases.js

export const usecases = [
  [
    "11/27/16, 21:06 - NAME: text",
    [ "11/27/16, 21:06 - ", "", "11/27/16", "27/", "/", ", ", undefined, "21:06", "21:06", "21:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "8/22/14, 12:41:13 PM: NAME: text",
    [ "8/22/14, 12:41:13 PM:", "", "8/22/14", "22/", "/", ", ", undefined, "12:41:13 PM", "12:41:13", "12:", ":13", " PM", undefined, "P", undefined, undefined, ":" ]
  ],
  [
    "3/21/18, 11:58 AM - NAME: text",
    [ "3/21/18, 11:58 AM - ", "", "3/21/18", "21/", "/", ", ", undefined, "11:58 AM", "11:58", "11:", undefined, " AM", undefined, "A", undefined, undefined, " - " ]
  ],
  [
    "14/01/2018, 09:54 - NAME: text",
    [ "14/01/2018, 09:54 - ", "", "14/01/2018", "01/", "/", ", ", undefined, "09:54", "09:54", "09:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "26/02/2015, 17:54:05: NAME: text",
    [ "26/02/2015, 17:54:05:", "", "26/02/2015", "02/", "/", ", ", undefined, "17:54:05", "17:54:05", "17:", ":05", undefined, undefined, undefined, undefined, undefined, ":" ] ],
  [
    "4/23/18, 11:42 - NAME: text",
    [ "4/23/18, 11:42 - ", "", "4/23/18", "23/", "/", ", ", undefined, "11:42", "11:42", "11:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "29-07-17 22:58 - NAME: text",
    [ "29-07-17 22:58 - ", "", "29-07-17", "07-", "-", " ", " ", "22:58", "22:58", "22:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "07-11-17, 20:05 - NAME: Message",
    [ "07-11-17, 20:05 - ", "", "07-11-17", "11-", "-", ", ", undefined, "20:05", "20:05", "20:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "04.02.17 tan 22:45",
    undefined
  ],
  [
    "04.02.17 á 22:45",
    undefined
  ],
  [
    "04.02.17 um 22:45 - NAME: text",
    [ "04.02.17 um 22:45 - ", "", "04.02.17", "02.", ".", " um ", " um ", "22:45", "22:45", "22:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "‎26.11.17 um 22:06 - Sophia Müller99: Achso",
    [ "26.11.17 um 22:06 - ", "", "26.11.17", "11.", ".", " um ", " um ", "22:06", "22:06", "22:", undefined, undefined, undefined, undefined, undefined, undefined, " - " ]
  ],
  [
    "[4/28/18 1:16:27 a. m.] NAME: text",
    [ "[4/28/18 1:16:27 a. m.] ", "[", "4/28/18", "28/", "/", " ", " ", "1:16:27 a. m.", "1:16:27", "1:", ":27", " a. m.", undefined, undefined, "a", " ", "] " ]
  ],
  [
    "[2017-03-13, 9:47:53 PM] NAME: text",
    [ "[2017-03-13, 9:47:53 PM] ", "[", "2017-03-13", "03-", "-", ", ", undefined, "9:47:53 PM", "9:47:53", "9:", ":53", " PM", undefined, "P", undefined, undefined, "] " ]
  ],
  [
    "[21/12/2015, 13:42:37] NAME: text",
    [ "[21/12/2015, 13:42:37] ", "[", "21/12/2015", "12/", "/", ", ", undefined, "13:42:37", "13:42:37", "13:", ":37", undefined, undefined, undefined, undefined, undefined, "] " ]
  ],
  [
    "[18-01-16 15:47:49] Venours: Hahaha",
    [ "[18-01-16 15:47:49] ", "[", "18-01-16", "01-", "-", " ", " ", "15:47:49", "15:47:49", "15:", ":49", undefined, undefined, undefined, undefined, undefined, "] " ]
  ]
];

now, we can separate the tokenizing functionality and require/use it in analyzer.js, we can import tokenizer something like:

analyzer.js

import { tokenizer } from 'tokenizer';

//..later on in the code 

while ((match = tokenizer(content)) != null) {
     //console.log("match found at " + match.index);
     indexArray[i] = match.index;
     // currently this array stores the index after the identifier
     messageStartIndexArray[i] = match[0].length;
     //nameArray[i] = match[16];
     timeArray[i] = match[7];
     dateArray[i] = match[2];
     i++;
   }

Now, giving that we isolated the tokenizer feature/functionality from the bigger analyzer module, wrote some tests, we can clearly and confidently modify/simplify the regular expression, divide it into smaller functions that does one thing at a time (extract date, extract time, extract day/night .. etc).

With a simple webpack configuration file and one command line, we can bundle (and maybe minify and uglify) analizer.js and all its imports/sub modules into one file called analizer.min.js and then use this in our frontend application or any other kind of applications.

I have an experience in doing this for multiple projects, so If you would like to have this kind of separation of concerns, let me know and I will hehlp you by create a PR as soon as I am free.

mowolf commented 6 years ago

That sounds great! Thanks for your explanation @Nilegfx ! Sure you are very welcome to help and implement this! If I can I do something to make this easy for you just ask.

Sent with GitHawk

Nilegfx commented 6 years ago

@mowolf

Hope you had a nice long holiday 😉

Please have a look at these two repos I created:

ChatAnalyzer (refactored):

This repo hold the webapp, DOM manipulation, and server side code I would highly recommend to get rid of jquery and manual DOM manipulation and use modern framework like react/angular/vuejs, I can also help you moving it slowly to react which will definitely improve the maintainability of the webapp.

I'll push the changes as PR to your repo if we agreed to go with this plan

refactor advantages:

refactor disadvantages:

chat-analyzer (new):

This repo is a temp one until you create your own repo and then we can push the code there (just to preserve the credit to you) You also need to create an npm account (if you don't already have one) that will hold the final distributed package in order to be able to install/use it in any other application npm install <YOUR_PACKAGE_NAME>

The plan for this repo is to hold the analysis codebase, extractions and data manipulation. it will be a dependency in your webapp.

We should move all data layer logic from the big giant analyze.js to this repo one by one, each peace of logic should be added there with respective unit tests and optionally some functional tests. an example of this it the tokenizer module that I added there.

This repo is ready for publishing as npm package and also shipped with latest ES6+ enablement.

I didn't have time to write development documentations (how to link two repos while development, how to publish .. etc) but I am willing to do so if we agreed to go with this way.

There is a lot more to do to enhance the app performance.

overall observations after digging deeper into the code

I am happy to help improving this very nice wepapp idea, so if you need any help or have any questions regarding all the above, please don't hesitate to ask for clarifications regardless.

mowolf commented 6 years ago

@Nilegfx hope you enjoyed your holiday as well!

It seems that you know what you are talking about and I am okay with moving it the direction you think is best.

Just to make you thinks clear: This is my first self written js application. I did this as a final project for the cs50 course (free cs course from Harvard), so I used what I knew and it became messy really fast. However I really like this project as it is a tool people want to use as well as to learn more coding! So here is what I think:

Thanks for you help :). To address any issues you had with refactoring I will have a look at your repo. And in case you want to talk about it to make it more clear for both of us just tell me.

Sent with GitHawk

mowolf commented 6 years ago

So I created the repo chat-analyzer you can push your code to.

git remote add origin https://github.com/mowolf/chat-analyzer.git
git push -u origin master
mowolf commented 6 years ago

@Nilegfx

no single test for any thing in this app. this made it harder for me to do this refactoring because I was not sure what are the use cases that I should consider when I refactor and move stuff around, so I just used my common sense and the obvious use case. if the refactoring dropped other use cases that I didn't notice, please let me know.

Here is a spreadsheet containg the usecases Also I just found a mistake in my regex. see https://github.com/mowolf/ChatAnalyzer/commit/47a3ea87638a1bbf68f97f6fc7c972d6c0481efd

Once you pushed the chat-analyzer I will change that in the tokenizer.js. I will add more usecases as well ;)

Nilegfx commented 6 years ago

@mowolf cool!

Looks like you forgot to add me to the new repo as a contributor, please add me in order to be able to push the code there.

Regarding the tests and use cases, I didn't mean the analyzer use cases, the excel sheet I saw before and this is the one I built the first tests based on, but I meant the whole web app use cases/screens, maybe a hidden feature that I didn't see .. etc.

I saw it clearly that you either not a javascript developer or it is your first self-written js application, to be honest, I thought you are a data scientist or something like that. but nevertheless, you did a GREAT job by having a great idea and a good start for better implementation that we together can improve and build on top of it.

let's keep the (react) part aside for some time until we "productionize" the analyzer and ensures its data structure, stability and performance.

Defiantly we can have multiple sessions on skype, phone calls or even if you ever had the chance to visit Berlin someday, we could meet (I saw you are in Munich, right?)

My skype handler is: ahmed.ayoub.hussain

mowolf commented 6 years ago

@Nilegfx you are added now. Yes I am in Munich. Surely we can meet sometime in Berlin (my brother is living there) ;)

Will add you on skype for the meantime

Nilegfx commented 6 years ago

I still don't have write access.

git push --set-upstream origin master
remote: Permission to mowolf/chat-analyzer.git denied to Nilegfx.
fatal: unable to access 'https://github.com/mowolf/chat-analyzer.git/': The requested URL returned error: 403
mowolf commented 6 years ago

https://github.com/mowolf/chat-analyzer/invitations

I guess you need to accept the invitation.