rvasa / jseat

Automatically exported from code.google.com/p/jseat
0 stars 0 forks source link

Version date extraction from the JAR file #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently we do not capture the build release date. This information is
obtained by scanning the release notes.

Every JAR file contains information inside about the date the class file
was created as well as the date the manifest was created.

We need to be able to pick up this information directly from the JAR file
and store it in the version. This will allow us to plot growth using the
calendar date information.

Once this date has been captured, we will then need to baseline the elapsed
time. 

So, Version 1 will be released on DAY 1
Version 2 will be released on DAY 54 etc. (this will show the number of
days elapsed since the very first version).  This information when combined
with the growth information can show blocks when high level of activity
takes place. Example: A release soon followed by a number of minor edits
and then a period of relative inactivity.

This information can also show us if projects have a regular activity rate
or if there is a preference for a different approach.

Additional comments::
Though one would consider that open-source developers may not contribute
regularly, a normally evolving project should have a certain growth rate
that is consistent. This would mean that over time, we should see a
tendency to linear growth.

Original issue reported on code.google.com by rajesh.v...@gmail.com on 17 Jul 2007 at 7:55

GoogleCodeExporter commented 9 years ago
I was just thinking about this the other day. Ultimately I envisage being able 
to 
switch between viewing a report/visualization's chronology as either RSN or 
date.

It may be worth creating a date utility for creating and converting the 
baseline. 
For example, convert between days/months/years, for which the MetricTable data 
set 
of a report would have to be scaled accordingly as well. I was wondering based 
on 
this whether we could 'compress' or 'expand' the granularity of our reports. 
Since 
all data is stored in a MetricTable for a report, would probably have to 
provide a 
utility to to scale this appropriately.

This would allow you to view long term trends easier if you have a large amount 
of 
versions (say 20-30+ plus releases this would be helpful). This is something 
that 
currently doesn't scale so well on our visualizations unless you have a very 
wide 
screen.

Examples:

RSN 1, RSN 2, RSN 3

Day1, Day 23, Day 54

Jan, Feb, Mar

This is something I want to think about more for a future fix. A straight 
conversion 
from RSN to days however would be a fairly quick fix and can be done in the 
intrim.

Original comment by jtha...@gmail.com on 17 Jul 2007 at 3:21

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
If we are working with an archive or JAR file we can use the time the entry was 
added to the archive. 
Otherwise, we can use the lastModified() time of a standard File.

Because the InputDataSet currently provides an Iterator<InputStream>, it is not 
easy for an object using it to 
obtain the creation or modifaction date of the respective stream.

My current thinking is to provide a new class called InputData. Something like 
this...

// Represents an InputData file in an InputDataSet
class InputData
{
  private final InputStream stream;
  private final long lastModified;

  public InputData(File file)
  {
     this.stream = new BufferedInputStream(new FileInputStream(file));
     this.lastModified = file.lastModified();
  }

  public InputData(ZipEntry entry, InputDataStream stream)
  {
     this.stream = stream;
     this.lastModified = entry.getTime();
  }  

 // Caller should close stream
 public InputStream getInputStream()
 {
    return stream;
 } 

 public long getLastModifiedTime()
 {
    return lastModified;
 }
}

It would then be appropriate to change the direct InputStream iterator to an 
input data iterator; 
iterator<InputData>

You can still access the InputStream in a convenient manner.
You can retrieve the last modification date associated with an InputStream
You can treat zip file entries or files the same.
Future fields could be added if needed.

You can get the actual time or date using a Calendar instance.
A DateUtility could do this but getting as a date for example...
InputData id = //...
Calendar cal = Calendar.getInstance();
Date date = cal.getTime(id.getLastModifiedTime());

Alternative approaches I thought of included pre-iterating the file set and 
extracting all the modification 
times into an entryDateSet. There is further complications here which i'll 
leave out for now as my preferred 
option is the one I demonstrated above.

It imposes the least amount of changes on its users and leaves a bit of head 
room for extra fields if we need 
it in the future.

Original comment by jtha...@gmail.com on 21 Jul 2007 at 1:38

GoogleCodeExporter commented 9 years ago
I like the option of getting the lastModTime.

We will still need to get this information into the Version object.

Original comment by rajesh.v...@gmail.com on 21 Jul 2007 at 1:43

GoogleCodeExporter commented 9 years ago
One thing that we need to probably consider is to look at the range of dates 
from the
InputData objects. In most cases we have only one JAR file. However, what if we 
have
multiple JAR files?  Should we use the highest value (or) the lowest value?  We 
may
need to store both the high/low in the version. But use the high for most 
reporting
purposes as the default date.

The advantage of having the low is mainly for a data integrity check and to 
make sure
that this range is not too big (more than a day at most).  We also have to make 
sure
that we take only dates where at least ONE file from the input data set has 
been used
for the Version (i.e. no files have been excluded by the EXCLUDE directive).  
This
certainly can be done at a later date.

Original comment by rajesh.v...@gmail.com on 21 Jul 2007 at 1:47

GoogleCodeExporter commented 9 years ago
I should also add...

Would need to add a 'name' field to the InputData so we know what class it 
represents.

Not entirely sure what to do about the "high / low problem" you identified. 
Would need to think about that some more.

We now have an InputDataSet that represents a Version and an InputData file 
that represents a class.

The high/low modification time of an InputDataSet can be set to a Version when 
it is being extracted. Likewise, I'll set the ClassMetricData modification time 
field from an InputData 
file in the ClassMetricExtractor.

Because I ultimately want to be able to change the time period I'm thinking 
I'll implement it as a complex metric, initially only implementing the 'DAYS' 
format. Other conversions like I 
suggested in Comment 1 I will leave for another task at a later date.

For a class I was thinking of adding a 'date' field to its vocabulary so you 
could do something like....
ClassMetricData cmd = //.....
cmd.getComplexMetric(ClassMetric.DATE); // return the day it was added.

Based on this for reporting purposes it would be handy to have a utlity to 
compress a History into a smaller time frame.

HistoryMetricData vmd = //....
HistoryCompressor hc = //...

hc.compressTo(hmd, DATE.DAYS);
hc.compressTo(hmd, DATE.WEEKS);
hc.compressTo(hmd, DATE.MONTHS);
hc.compressTo(hmd, DATE.YEARS);

or maybe called something like...

HistoryViewer hv = //...
hv.viewHistory(hmd, DATE.DAYS);
hv.viewHistory(hmd, DATE.WEEKS);
hv.viewHistory(hmd, DATE.MONTHS);
hv.viewHistory(hmd, DATE.YEARS);

An alternative solution might be a decorator that changes the way MetricData is 
viewed to a Report. It would basically wrap a MetricData object to provide the 
specified view.
This falls outside the scope of the initial implementation though so I will 
leave it for another task at a later date.

Original comment by jtha...@gmail.com on 21 Jul 2007 at 2:36

GoogleCodeExporter commented 9 years ago
Actually our complex metrics are doubles so that will not work.
Simple metrics are int's.
Properties are strings.

I'll have to add it as a separate field and respective getter method. Not a big 
issue, I was just trying to keep it 
consistent with the way we use MetricData in the rest of the system.

Original comment by jtha...@gmail.com on 21 Jul 2007 at 2:41

GoogleCodeExporter commented 9 years ago
The InputData class has been developed along with the appropriate changes to 
the the 
Iterator of the InputDataSet. The dates are now being extracted and during post 
processing an extra function has been added to computeHighAndLowDates for a 
version. 

Currently, this information is not persisted to file however and is thus not 
used at 
the moment.

Original comment by jtha...@gmail.com on 28 Aug 2007 at 4:30